Pandas: How to check if two DataFrames are equal
Suppose those 2 DataFrames has identical column names.
import pandas as pd
df_a = pd.read_csv("df_a.tsv", sep="\t")
df_a = df_a.set_index("xxx").sort_index()
df_b = pd.read_csv("df_b.tsv", sep="\t")
df_b = df_b.set_index("xxx").sort_index()
>>> df_a.equals(df_b)
>>> True
>>> all(df_a == df_b)
>>> True
sort_index()
is a MUST becauseDataFrame.equals()
is weak in that it won’t compare records with the same index automatically! Instead it seems to compare row-wise brutally.df_a == df_b
also performs row-wise comparison but if the indices of those 2 DataFrames were not exactly the same (in values and orders), it will throwValueError: Can only compare identically-labeled DataFrame objects
.
Comments