3

I have a dataframe like this:

col1  col2  col3  col4
  A    W     Z     C
  F    W     P     F
  E    P     Y     C
  B    C     B     C
  M    A     V     C
  D    O     X     A
  Y    L     Y     D
  Q    V     R     A

I want to filter if multiple columns have a certain value. For instance I want to filter the rows that contain A. As a result it should be:

 col1  col2  col3  col4
  A    W     Z     C
  M    A     V     C
  D    O     X     A
  Q    V     R     A

Since it is just a small representation of a large dataset, I cannot go with

df[(df['col1'].str.contains('A')) | (df['col2'].str.contains('A')) | (df['col3'].str.contains('A')) | 
(df['col4'].str.contains('A'))]

Is there any other way?

0

3 Answers 3

4

Here's a way you can do:

df[df.applymap(lambda x: x == 'A').any(1)]

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

For multiple cases, you can do like A, B:

df[df.applymap(lambda x: x in ['A','B']).any(1)]
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer @YOLO. Can you explain what the function of any(1) is?
any(1) check row-wise if there is any 'A' present, if yes, it returns a TRUE
Thanks, this looks just fine. Just a small question: if this would be a list like ['A','B'] instead of just A, how can I make the filter? Because I cannot make an isin statement for a string.
4

We could use DataFrame.stack + Series.unstack with DataFrame.any

df[df.stack(dropna=False).str.contains('A').unstack().any(axis=1)]

or better solution suggested by @Alollz

df[df.stack().str.contains('A').any(level=0)]

Output

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

UPDATE

To check several characters use join

df[df.stack().str.contains('|'.join(['A','B'])).any(level=0)]

  col1 col2 col3 col4
0    A    W    Z    C
3    B    C    B    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

4 Comments

You don't even need the unstack. any can operate over Index levels so: df.stack().str.contains('A').any(level=0) gets you the mask.
@ALollz very nice, i should explore much :) I thought only aggregate functions had levels as param , but didn't consider any
Yeah, any, sum and a few others support the level argument. I don't think there's any real gain of df.sum(level=0) as compared to df.groupby(level=0).sum(), other than it being less verbose.
You are right:) I always forget that I can do this @ALollz
2

You can use apply and any along axis=1:

df[df.apply(lambda x: x.str.contains('A')).any(1)]

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

Or:

s=df.stack()
s[s.str.contains('A').groupby(level=0).transform('any')].unstack()

  col1 col2 col3 col4
0    A    W    Z    C
4    M    A    V    C
5    D    O    X    A
7    Q    V    R    A

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.