Checking for missing values on dataframe compared to reference dataframe

Question

I have 2 dataframes which I've imported from a csv file. The first is the reference df with all the answers:

             state
0          Alabama
1           Alaska
2          Arizona

While the second df has several lists which contain data which I want to cross check with the first df:

    column1     column2
0   Arizona  Washington
1  New York    New York
2       NaN        Utah

I basically would want a third df which shows me in column1 and column2 the missing items compared to my reference df. All I found was to check identical df's for differences. But in my case I have less rows in the second df, as well are the items in a different order.

Zunderding · Accepted Answer · 2021-11-04 14:55:41Z

1

So, through going through several other stack overflow questions, some thinking and some trial and error I found the solution:

dict1 = {}
df1 = pandas.DataFrame(dict1)
data_imported = pandas.DataFrame(imported)

columns = ["column1", "column2", "column3"]  # here I am still looking how to add the df's columns in this list, but I assume this shouldn't be too hard.

for column in columns:
    col = []
    for i, row in data_reference.iterrows():
        if row['state'] in data_imported[column].unique():
            pass
        else:
            col.append(row['state'])
    df1[column] = pandas.Series(col)

print(df1)

If somebody still comes up with a more elegant version (I read the .iterrows() function is to be avoided), please share.

edited Nov 4, 2021 at 14:55

answered Nov 4, 2021 at 13:50

Zunderding

618 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Checking for missing values on dataframe compared to reference dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related