Subset rows based on multiple iterative columns in Pandas

Question

I need to subset rows of df based on several columns (c1 through c100 columns) which have strings. Subset rows which equal a particular value for any of the 100 columns. Minimal example with 3 columns is:

df = pd.DataFrame({"": [0,1,2,3,4,5,6,7,8],
                     "c1": ["abc1", "", "dfg", "abc1", "","dfg","ghj","abc1","abc1"], 
                     "c2": ["abc1", "abc1", "dfg", "dfg", "","dfg","","ghj","abc1"], 
                     "c3": ["abc1", "", "dfg", "dfg", "dfg","dfg","abc1","ghj","abc1"]})

        c1      c2      c3
0   0   abc1    abc1    abc1
1   1           abc1    
2   2   dfg     dfg     dfg
3   3   abc1    dfg     dfg
4   4                   dfg
5   5   dfg     dfg     dfg
6   6   ghj             abc1
7   7   abc1    ghj     ghj
8   8   abc1    abc1    abc1

The loc command gives us the answer but too many 'or' conditions to write. I am looking for something iterative over 100 columns. Also what if I want to check for a list of strings instead of just 1 string. For example ['abc1', 'ghj'], instead of just 'abc1'.

df.loc[(df['c1'] == "abc1") | (df['c2'] == "abc1") | (df['c3'] == "abc1")]

Akshay Sehgal · Accepted Answer · 2022-12-24 23:40:44Z

4

1. Searching for a single string

You can just check each "cell" with df.eq('abc1') or df=='abc1' and then take an OR over columns using .any(1) to check if each row has at least 1 instance of 'abc1'. Then use that boolean series for indexing the original data frame to get the rows you need.

output = df[df.eq('abc1').any(1)]

print(output)

        c1    c2    c3
0  0  abc1  abc1  abc1
1  1        abc1      
3  3  abc1   dfg   dfg
6  6   ghj        abc1
7  7  abc1   ghj   ghj
8  8  abc1  abc1  abc1

If you want to check only a specific set of columns then use df[list_of_columns] == 'abc1' instead.

2. Searching for multiple strings

For multiple strings use df.isin([list-of-strings]). Here is an example of how that would work.

output = df[df.isin(['abc1', 'dfg']).any(1)]

print(output)

        c1    c2    c3
0  0  abc1  abc1  abc1
1  1        abc1      
2  2   dfg   dfg   dfg
3  3  abc1   dfg   dfg
4  4               dfg
5  5   dfg   dfg   dfg
6  6   ghj        abc1
7  7  abc1   ghj   ghj
8  8  abc1  abc1  abc1

3. Searching for multiple strings across specific columns

If you want to search for multiple strings in a list of specific columns, you can just modify the previous .isin or .eq method with the list of columns filtered, and then continue the same method. Here is an example for the above script working for just 2 columns c1 and c3

output = df[df[['c1','c3']].isin(['abc1','ghj']).any(1)]

print(output)

        c1    c2    c3
0  0  abc1  abc1  abc1
3  3  abc1   dfg   dfg
6  6   ghj        abc1
7  7  abc1   ghj   ghj
8  8  abc1  abc1  abc1

References

Read more here on the official documentation -

edited Dec 24, 2022 at 23:40

answered Dec 24, 2022 at 23:21

Akshay Sehgal

19.4k3 gold badges26 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

AAA Over a year ago

What to do to check two strings instead of just 1?

Akshay Sehgal Over a year ago

You can just use these as independent conditions. How many strings do you want to check?

Akshay Sehgal Over a year ago

Added another way for multiple strings.

Akshay Sehgal Over a year ago

Does my EDIT work for you? if yes, do mark the answer! thanks!

Akshay Sehgal Over a year ago

Glad to help. I have added another edit for a handling multiple columns as well. do check!

Collectives™ on Stack Overflow

Subset rows based on multiple iterative columns in Pandas

1 Answer 1

1. Searching for a single string

2. Searching for multiple strings

3. Searching for multiple strings across specific columns

References

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1. Searching for a single string

2. Searching for multiple strings

3. Searching for multiple strings across specific columns

References

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related