5

I have a dataframe with about 5 columns. The value I am looking to match could be present in either of the last 3 columns.

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
3        fgh       20        22      23
4        lmn       20        22      21

I am filtering on value 21 on any of the last three columns as follows:

df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]

which gives me

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
4        lmn       20        22      21

Using this new df1 I want to get this

Key   |  col1   |  newCol
-------------------------
1        abc       21      
2        cde       21      
4        lmn       21      

Basically any of the matched column as the new column value. How do I do this using pandas? I appreciate the help. So I was thinking may be I should filter and map it to the new column at the same time but I don't know how?

3
  • From the second dataframe, how would you know which value you've filtered for? In this case it could have been either 21 or 22. Commented Feb 21, 2018 at 19:55
  • If you know what value you're matching on, why can't you create the new column as this value? Or are you asking to pull out the common values that exist in the 3 resulting columns (without knowing that it's '21')? Commented Feb 21, 2018 at 19:58
  • Right I don't know which one. So I was thinking may be I should filter and map it to the new column at the same time but I don't know how? Commented Feb 21, 2018 at 19:58

3 Answers 3

6

Use

In [722]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), 
                 ['Key', 'col1']].assign(newcol=21)
Out[722]:
   Key col1  newcol
0    1  abc      21
1    2  cde      21
3    4  lmn      21

Details

Equality check eq on necessary ['col2', 'col3', 'col4'] columns

In [724]: df[['col2', 'col3', 'col4']].eq(21)
Out[724]:
    col2   col3   col4
0   True  False  False
1  False   True  False
2  False  False  False
3  False  False   True

any would return whether any element is True in the row

In [725]: df[['col2', 'col3', 'col4']].eq(21).any(1)
Out[725]:
0     True
1     True
2    False
3     True
dtype: bool

Use .loc to subset the matched rows and necessary ['Key', 'col1'] columns.

In [726]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), ['Key', 'col1']]
Out[726]:
   Key col1
0    1  abc
1    2  cde
3    4  lmn

And, .assign(newcol=21) creates a newcol column set to 21

Sign up to request clarification or add additional context in comments.

1 Comment

imho - this is definitely the best answer here - if it had more explanation though - that'd be great :)
2

Here is one way.

import pandas as pd, numpy as np

df = pd.DataFrame([[1, 'abc', 21, 22, 23],
                   [2, 'cde', 22, 21, 20],
                   [3, 'fgh', 20, 22, 23],
                   [4, 'lmn', 20, 22, 21]],
                  columns=['Key', 'col1', 'col2', 'col3', 'col4'])

df2 = df[np.logical_or.reduce([df[col] == 21 for col in ['col2', 'col3', 'col4']])]\
        .assign(newCol=21)\
        .drop(['col2', 'col3', 'col4'], 1)

#    Key col1  newCol
# 0    1  abc      21
# 1    2  cde      21
# 3    4  lmn      21

Explanation

  • Store integers as integers rather than strings.
  • np.logical_or.reduce applies your | condition across a list comprehension.
  • assign creates a new column with the filter value.
  • drop removes unwanted columns, axis=1 refers to columns.

1 Comment

Both of these works but I can only accept one. Accepting this for the explanation. I will upvote the other answer from @Zero. I greatly appreciate your help.
0

As jpp pointed out, you have 2 possibilities here: both 21 and 22 are common across all 3 columns. Assuming you don't know which one you're really looking for, what you can do is to use set() to isolate the unique values for each column, then use set.intersection() to find the commonalities:

df = pd.DataFrame([{'col1':'a', 'col2':21, 'col3':22, 'col4':23},
                   {'col1':'b', 'col2':22, 'col3':21, 'col4':20},
                   {'col1':'c', 'col2':20, 'col3':22, 'col4':21},
                   {'col1':'d', 'col2':21, 'col3':21, 'col4':22}])

s1 = set(df['col2'].values)
s2 = set(df['col3'].values)
s3 = set(df['col4'].values)

df['new_col'] = str(s1.intersection(s2, s3))
df

col1    col2    col3    col4    new_col
   a    21      22      23      {21, 22}
   b    22      21      20      {21, 22}
   c    20      22      21      {21, 22}
   d    21      21      22      {21, 22}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.