Filtering Multiple Columns Containing a Certain Value

Question

I have a dataframe like this:

col1  col2  col3  col4
  A    W     Z     C
  F    W     P     F
  E    P     Y     C
  B    C     B     C
  M    A     V     C
  D    O     X     A
  Y    L     Y     D
  Q    V     R     A

I want to filter if multiple columns have a certain value. For instance I want to filter the rows that contain A. As a result it should be:

 col1  col2  col3  col4
  A    W     Z     C
  M    A     V     C
  D    O     X     A
  Q    V     R     A

Since it is just a small representation of a large dataset, I cannot go with

df[(df['col1'].str.contains('A')) | (df['col2'].str.contains('A')) | (df['col3'].str.contains('A')) | 
(df['col4'].str.contains('A'))]

Is there any other way?

YOLO · Accepted Answer · 2020-01-15 18:23:51Z

4

Here's a way you can do:

df[df.applymap(lambda x: x == 'A').any(1)]

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

For multiple cases, you can do like A, B:

df[df.applymap(lambda x: x in ['A','B']).any(1)]

edited Jan 15, 2020 at 18:23

answered Jan 15, 2020 at 18:03

YOLO

22k5 gold badges25 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

realkes Over a year ago

Thanks for the answer @YOLO. Can you explain what the function of any(1) is?

YOLO Over a year ago

any(1) check row-wise if there is any 'A' present, if yes, it returns a TRUE

realkes Over a year ago

Thanks, this looks just fine. Just a small question: if this would be a list like ['A','B'] instead of just A, how can I make the filter? Because I cannot make an isin statement for a string.

ansev · Accepted Answer · 2020-01-15 18:25:13Z

4

We could use DataFrame.stack + Series.unstack with DataFrame.any

df[df.stack(dropna=False).str.contains('A').unstack().any(axis=1)]

or better solution suggested by @Alollz

df[df.stack().str.contains('A').any(level=0)]

Output

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

UPDATE

To check several characters use join

df[df.stack().str.contains('|'.join(['A','B'])).any(level=0)]

  col1 col2 col3 col4
0    A    W    Z    C
3    B    C    B    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

edited Jan 15, 2020 at 18:25

answered Jan 15, 2020 at 18:06

ansev

31k5 gold badges21 silver badges33 bronze badges

4 Comments

ALollz Over a year ago

You don't even need the unstack. any can operate over Index levels so: df.stack().str.contains('A').any(level=0) gets you the mask.

anky Over a year ago

@ALollz very nice, i should explore much :) I thought only aggregate functions had levels as param , but didn't consider any

ALollz Over a year ago

Yeah, any, sum and a few others support the level argument. I don't think there's any real gain of df.sum(level=0) as compared to df.groupby(level=0).sum(), other than it being less verbose.

ansev Over a year ago

You are right:) I always forget that I can do this @ALollz

anky · Accepted Answer · 2020-01-15 18:06:21Z

2

You can use apply and any along axis=1:

df[df.apply(lambda x: x.str.contains('A')).any(1)]

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

Or:

s=df.stack()
s[s.str.contains('A').groupby(level=0).transform('any')].unstack()

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

edited Jan 15, 2020 at 18:06

answered Jan 15, 2020 at 18:04

anky

75.3k11 gold badges46 silver badges76 bronze badges

Collectives™ on Stack Overflow

Filtering Multiple Columns Containing a Certain Value

3 Answers 3

3 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related