2

I have following two dataframes:

df1:

name
abc
lmn
pqr

df2:

m_name  n_name  loc
abc     tyu     IND
bcd     abc     RSA
efg     poi     SL      
lmn     ert     AUS
nne     bnm     ENG
pqr     lmn     NZ
xyz     asd     BAN

I want to generate a new dataframe on following condition:

  1. if df2.m_name==df1.name or df2.n_name==df1.name

  2. eliminate duplicate rows

Following is desired output:

m_name  n_name  loc
abc     tyu     IND
bcd     abc     RSA
lmn     ert     AUS
pqr     lmn     NZ

Can I get any suggestions on how to achieve this??

2 Answers 2

2

Use:

print (df2)
  m_name n_name  loc
0    abc    tyu  IND
1    abc    tyu  IND
2    bcd    abc  RSA
3    efg    poi   SL
4    lmn    ert  AUS
5    nne    bnm  ENG
6    pqr    lmn   NZ
7    xyz    asd  BAN

df3 = df2.filter(like='name')
#another solution is filter columns by columns names in list
#df3 = df2[['m_name','n_name']]
df = df2[df3.isin(df1['name'].tolist()).any(axis=1)]
df = df.drop_duplicates(df3.columns)
print (df)
  m_name n_name  loc
0    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ

Details:

Seelct all columns with name by filter:

print (df2.filter(like='name'))
  m_name n_name
0    abc    tyu
1    abc    tyu
2    bcd    abc
3    efg    poi
4    lmn    ert
5    nne    bnm
6    pqr    lmn
7    xyz    asd

Compare by DataFrame.isin:

print (df2.filter(like='name').isin(df1['name'].tolist()))
   m_name  n_name
0    True   False
1    True   False
2   False    True
3   False   False
4    True   False
5   False   False
6    True    True
7   False   False

Get at least one True per row by any:

print (df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1))
0     True
1     True
2     True
3    False
4     True
5    False
6     True
7    False
dtype: bool

Filter by boolean indexing:

df = df2[df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1)]
print (df)
  m_name n_name  loc
0    abc    tyu  IND
1    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ

And last remove duplicates drop_duplicates (If need remove dupes by all name columns add subset parameter)

df = df.drop_duplicates(subset=df3.columns)
print (df)
  m_name n_name  loc
0    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ
Sign up to request clarification or add additional context in comments.

Comments

2

Use

In [56]: df2[df2.m_name.isin(df1.name) | df2.n_name.isin(df1.name)]
Out[56]:
  m_name n_name  loc
0    abc    tyu  IND
1    bcd    abc  RSA
3    lmn    ert  AUS
5    pqr    lmn   NZ

Or using query

In [58]: df2.query('m_name in @df1.name or  n_name in @df1.name')
Out[58]:
  m_name n_name  loc
0    abc    tyu  IND
1    bcd    abc  RSA
3    lmn    ert  AUS
5    pqr    lmn   NZ

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.