I have many dataframes. I need to compare the columns across all of them. Ideally I want to return the positions of those columns that do not match as well as the name of columns from df1 and df2 being compared.
Note: I only need to compare the columns, not the data.
Example:
df1 = pd.DataFrame(columns=['Name', 'Age', 'Address', 'Telephone'])
df2 = pd.DataFrame(columns=['Nombre', 'Age', 'Address', 'Telefono'])
df3 = pd.DataFrame(columns=['N.', 'A.', 'Address', 'Telephone', 'Email'])
df4 = pd.DataFrame(columns=['Name', 'Age', 'Address', 'Telephone'])
Expected output:
| DataFrame_A | DataFrame_B | Positions | Col_df_A | Col_df_B |
|---|---|---|---|---|
| df1 | df2 | 0,3 | ['Name', 'Telephone'] | ['Nombre', 'Telefono'] |
| df1 | df3 | 0,1,4 | ['Name', 'Age'] | ['N.', 'A.', 'Email' |
What is best way to do this?
from itertools import combinations
dfs_dict = {"df1": df1, "df2": df2, "df3": df3, "df4": df4}
dfs_names = dfs_dict.keys().tolist()
for df1, df2 in combinations(dfs_names, 2):
...