4

I currently have two dataframes that have two matching columns. For example :

Data frame 1 with columns : A,B,C

Data frame 2 with column : A

I want to keep all lines in the first dataframe that have the values that the A contains. For example if df2 and df1 are:

df1

A B C
0 1 3
4 2 5
6 3 1
8 0 0
2 1 1

df2
Α
4
6
1

So in this case, I want to only keep the second and third line of df1. I tried doing it like this, but it didnt work since both dataframes are pretty big:

for index, row in df1.iterrows():
    counter = 0
    for index2,row2 in df2.iterrows():
        if row["A"] == row2["A"]:
            counter = counter + 1
    if counter == 0:
        df2.drop(index, inplace=True)
0

2 Answers 2

6

Use isin to test for membership:

In [176]:
df1[df1['A'].isin(df2['A'])]

Out[176]:
   A  B  C
1  4  2  5
2  6  3  1
Sign up to request clarification or add additional context in comments.

Comments

3

Or use the merge method:

df1= pandas.DataFrame([[0,1,3],[4,2,5],[6,3,1],[8,0,0],[2,1,1]], columns = ['A', 'B', 'C'])
df2= pandas.DataFrame([4,6,1], columns = ['A'])
df2.merge(df1, on = 'A')

1 Comment

Interesting thing is that when you merge, pandas creates a new dataframe in which the index is unrelated to either either index from left or right side of merge. This is to accommodate possible duplicates on either side. Using isin is an exercise in filtering one dataframe with values in another. That said, this is useful!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.