0

I have 2 dataframes which I've imported from a csv file. The first is the reference df with all the answers:

             state
0          Alabama
1           Alaska
2          Arizona

While the second df has several lists which contain data which I want to cross check with the first df:

    column1     column2
0   Arizona  Washington
1  New York    New York
2       NaN        Utah

I basically would want a third df which shows me in column1 and column2 the missing items compared to my reference df. All I found was to check identical df's for differences. But in my case I have less rows in the second df, as well are the items in a different order.

1 Answer 1

1

So, through going through several other stack overflow questions, some thinking and some trial and error I found the solution:

dict1 = {}
df1 = pandas.DataFrame(dict1)
data_imported = pandas.DataFrame(imported)

columns = ["column1", "column2", "column3"]  # here I am still looking how to add the df's columns in this list, but I assume this shouldn't be too hard.

for column in columns:
    col = []
    for i, row in data_reference.iterrows():
        if row['state'] in data_imported[column].unique():
            pass
        else:
            col.append(row['state'])
    df1[column] = pandas.Series(col)

print(df1)

If somebody still comes up with a more elegant version (I read the .iterrows() function is to be avoided), please share.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.