(Python) Selecting rows containing a string in ANY column?

Question

I am trying to iterate through a dataframe and return the rows that contain a string "x" in any column.

This is what I have been trying

for col in df:

     rows = df[df[col].str.contains(searchTerm, case = False, na = False)]

However, it only returns up to 2 rows if I search for a string I know exists there and in more rows.

How do I make sure it is searching every row of every column?

Edit: My end goal is to get the row and column of the cell containing the string searchTerm

Are you searching for a full string values or a substring? If it's the full string then do rows = np.where(df.values == 'x')[0]. Don't iterate through dataframes if you can help it. — NotAName
– NotAName, Commented Feb 4, 2021 at 0:59

mayosten · Accepted Answer · 2021-02-04 15:17:21Z

Welcome!

Agree with all the comments. It's generally best practice to find a way to accomplish what you want in Pandas/Numpy without iterating over rows/columns.

If the objective is to "find rows where any column contains the value 'x'), life is a lot easier than you think.

Below is some data:

import pandas as pd
df = pd.DataFrame({
        'a': range(10), 
        'b': ['x', 'b', 'c', 'd', 'x', 'f', 'g', 'h', 'i', 'x'],
        'c': [False, False, True, True, True, False, False, True, True, True],
        'd': [1, 'x', 3, 4, 5, 6, 7, 8, 'x', 10]
})

print(df)
   a  b      c   d
0  0  x  False   1
1  1  b  False   x
2  2  c   True   3
3  3  d   True   4
4  4  x   True   5
5  5  f  False   6
6  6  g  False   7
7  7  h   True   8
8  8  i   True   x
9  9  x   True  10

So clearly rows 0, 1, 4, 8 and 9 should be included.

If we just do df == 'x', pandas broadcasts the comparison across the whole dataframe:

df == 'x'
       a      b      c      d
0  False   True  False  False
1  False  False  False   True
2  False  False  False  False
3  False  False  False  False
4  False   True  False  False
5  False  False  False  False
6  False  False  False  False
7  False  False  False  False
8  False  False  False   True
9  False   True  False  False

But pandas also has the handy .any method, to check for True in any dimension. So if we want to check across all columns, we want dimension 1:

rows = (df == 'x').any(axis=1)

print(rows)
0     True
1     True
2    False
3    False
4     True
5    False
6    False
7    False
8     True
9     True

Note that if you want your solution to be truly case sensitive like what you're using with the .str method, you might need something more like:

rows = (df.applymap(lambda x: str(x).lower() == 'x')).any(axis=1)

The correct rows are flagged without any looping. And you get a series back that can be used for indexing the original dataframe:


df.loc[rows]
   a  b      c   d
0  0  x  False   1
1  1  b  False   x
4  4  x   True   5
8  8  i   True   x
9  9  x   True  10

Collectives™ on Stack Overflow

(Python) Selecting rows containing a string in ANY column?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related