Pandas match on multiple columns and get matching values as a single new column

Question

I have a dataframe with about 5 columns. The value I am looking to match could be present in either of the last 3 columns.

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
3        fgh       20        22      23
4        lmn       20        22      21

I am filtering on value 21 on any of the last three columns as follows:

df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]

which gives me

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
4        lmn       20        22      21

Using this new df1 I want to get this

Key   |  col1   |  newCol
-------------------------
1        abc       21      
2        cde       21      
4        lmn       21

Basically any of the matched column as the new column value. How do I do this using pandas? I appreciate the help. So I was thinking may be I should filter and map it to the new column at the same time but I don't know how?

From the second dataframe, how would you know which value you've filtered for? In this case it could have been either 21 or 22. — jpp
– jpp, Commented Feb 21, 2018 at 19:55
If you know what value you're matching on, why can't you create the new column as this value? Or are you asking to pull out the common values that exist in the 3 resulting columns (without knowing that it's '21')? — AdmiralWen
– AdmiralWen, Commented Feb 21, 2018 at 19:58
Right I don't know which one. So I was thinking may be I should filter and map it to the new column at the same time but I don't know how? — Conquest
– Conquest, Commented Feb 21, 2018 at 19:58

Zero · Accepted Answer · 2018-02-22 04:51:35Z

6

Use

In [722]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), 
                 ['Key', 'col1']].assign(newcol=21)
Out[722]:
   Key col1  newcol
0    1  abc      21
1    2  cde      21
3    4  lmn      21

Details

Equality check eq on necessary ['col2', 'col3', 'col4'] columns

In [724]: df[['col2', 'col3', 'col4']].eq(21)
Out[724]:
    col2   col3   col4
0   True  False  False
1  False   True  False
2  False  False  False
3  False  False   True

any would return whether any element is True in the row

In [725]: df[['col2', 'col3', 'col4']].eq(21).any(1)
Out[725]:
0     True
1     True
2    False
3     True
dtype: bool

Use .loc to subset the matched rows and necessary ['Key', 'col1'] columns.

In [726]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), ['Key', 'col1']]
Out[726]:
   Key col1
0    1  abc
1    2  cde
3    4  lmn

And, .assign(newcol=21) creates a newcol column set to 21

edited Feb 22, 2018 at 4:51

answered Feb 21, 2018 at 19:59

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jon Clements Over a year ago

imho - this is definitely the best answer here - if it had more explanation though - that'd be great :)

jpp · Accepted Answer · 2018-02-21 20:00:40Z

2

Here is one way.

import pandas as pd, numpy as np

df = pd.DataFrame([[1, 'abc', 21, 22, 23],
                   [2, 'cde', 22, 21, 20],
                   [3, 'fgh', 20, 22, 23],
                   [4, 'lmn', 20, 22, 21]],
                  columns=['Key', 'col1', 'col2', 'col3', 'col4'])

df2 = df[np.logical_or.reduce([df[col] == 21 for col in ['col2', 'col3', 'col4']])]\
        .assign(newCol=21)\
        .drop(['col2', 'col3', 'col4'], 1)

#    Key col1  newCol
# 0    1  abc      21
# 1    2  cde      21
# 3    4  lmn      21

Explanation

Store integers as integers rather than strings.
np.logical_or.reduce applies your | condition across a list comprehension.
assign creates a new column with the filter value.
drop removes unwanted columns, axis=1 refers to columns.

answered Feb 21, 2018 at 20:00

jpp

166k37 gold badges301 silver badges362 bronze badges

1 Comment

Conquest Over a year ago

Both of these works but I can only accept one. Accepting this for the explanation. I will upvote the other answer from @Zero. I greatly appreciate your help.

AdmiralWen · Accepted Answer · 2018-02-21 20:13:09Z

As jpp pointed out, you have 2 possibilities here: both 21 and 22 are common across all 3 columns. Assuming you don't know which one you're really looking for, what you can do is to use set() to isolate the unique values for each column, then use set.intersection() to find the commonalities:

df = pd.DataFrame([{'col1':'a', 'col2':21, 'col3':22, 'col4':23},
                   {'col1':'b', 'col2':22, 'col3':21, 'col4':20},
                   {'col1':'c', 'col2':20, 'col3':22, 'col4':21},
                   {'col1':'d', 'col2':21, 'col3':21, 'col4':22}])

s1 = set(df['col2'].values)
s2 = set(df['col3'].values)
s3 = set(df['col4'].values)

df['new_col'] = str(s1.intersection(s2, s3))
df

col1    col2    col3    col4    new_col
   a    21      22      23      {21, 22}
   b    22      21      20      {21, 22}
   c    20      22      21      {21, 22}
   d    21      21      22      {21, 22}

Collectives™ on Stack Overflow

Pandas match on multiple columns and get matching values as a single new column

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related