0

Take two data frames

print(df1)

   A  B
0  a  1
1  a  3
2  a  5
3  b  7
4  b  9
5  c  11
6  c  13
7  c  15

print(df2)

   C     D
a  apple 1
b  pear  1
c  apple 1

So the values in column df1['A'] are the indexes of df2.

I want to select the rows in df1 where the values in column A are 'apple' in df2['C']. Resulting in:

   A  B
0  a  1
1  a  3
2  a  5
5  c  11
6  c  13
7  c  15

1 Answer 1

2

Made many edits due to comments and question edits, Basically you first extract the indexes of df2 by filtering the dataframe by values in C, then filter the df2 by indexes with isin

indexes = df2[df2['C']=='apple'].index
df1[df1['A'].isin(indexes)]
>>>
   A  B
0  a  1
1  a  3
2  a  5
5  c  11
6  c  13
7  c  15

UPDATE

If you want to minimize memory allocation try to prevent saving information, (note. That i am not sure ot will solve your menory allocation issue because i didnt have full details of the situation and maybe even not suited enough to provide a solution):

df1[df1['A'].isin( df2[df2['C']=='apple'].index)]
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. This works for my smaller data frames but but my df1 has 600 rows and my df2 has 700,000 rows and I get a memory error.
Thanks for your help but still memory error. I'll try structure the data frames differently.
Good luck! Was nice to help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.