I have multiple old tables that I need to compare to new tables, and I want to use pandas to compare them.
The data looks something like this:
data_old = [
["John", 1,2,3,None],
["Mary", 1,4,None,"Apples"],
["Jean", None,4,-4,"Peaches"],
]
columns = ['name', "A", "B", "C", "D"]
data_new = [
["Jean", 5,4,-4,"Peaches"],
["John", 1,-2,3,None],
["Mary", 1,4,None,"Apples"],
]
Here we have two relatively small datasets, and the values can be text, numeric, or NULL. I wanted to create sub-sets dataframe with the changes and export report dataframes to a csv.
My issue is this, when I sort both dataframes by 'name', I get the following error:
ValueError: Can only compare identically-labeled DataFrame objects
Here is what I am doing:
df_old = pd.DataFrame(data=data_old, columns=columns)
df_old.sort(columns='name', inplace=True)
df_new = pd.DataFrame(data=data_new, columns=columns)
df_new.sort(columns='name', inplace=True)
ne = (df_old != df_new).any(1) #ERROR
# to other stuff.....