1

I am trying to iterate through a dataframe and return the rows that contain a string "x" in any column.

This is what I have been trying

for col in df:

     rows = df[df[col].str.contains(searchTerm, case = False, na = False)]

However, it only returns up to 2 rows if I search for a string I know exists there and in more rows.

How do I make sure it is searching every row of every column?

Edit: My end goal is to get the row and column of the cell containing the string searchTerm

2
  • Are you searching for a full string values or a substring? If it's the full string then do rows = np.where(df.values == 'x')[0]. Don't iterate through dataframes if you can help it. Commented Feb 4, 2021 at 0:59
  • 1
    kindly share sample data with expected output Commented Feb 4, 2021 at 1:02

1 Answer 1

3

Welcome!

Agree with all the comments. It's generally best practice to find a way to accomplish what you want in Pandas/Numpy without iterating over rows/columns.

If the objective is to "find rows where any column contains the value 'x'), life is a lot easier than you think.

Below is some data:

import pandas as pd
df = pd.DataFrame({
        'a': range(10), 
        'b': ['x', 'b', 'c', 'd', 'x', 'f', 'g', 'h', 'i', 'x'],
        'c': [False, False, True, True, True, False, False, True, True, True],
        'd': [1, 'x', 3, 4, 5, 6, 7, 8, 'x', 10]
})

print(df)
   a  b      c   d
0  0  x  False   1
1  1  b  False   x
2  2  c   True   3
3  3  d   True   4
4  4  x   True   5
5  5  f  False   6
6  6  g  False   7
7  7  h   True   8
8  8  i   True   x
9  9  x   True  10

So clearly rows 0, 1, 4, 8 and 9 should be included.

If we just do df == 'x', pandas broadcasts the comparison across the whole dataframe:

df == 'x'
       a      b      c      d
0  False   True  False  False
1  False  False  False   True
2  False  False  False  False
3  False  False  False  False
4  False   True  False  False
5  False  False  False  False
6  False  False  False  False
7  False  False  False  False
8  False  False  False   True
9  False   True  False  False

But pandas also has the handy .any method, to check for True in any dimension. So if we want to check across all columns, we want dimension 1:

rows = (df == 'x').any(axis=1)

print(rows)
0     True
1     True
2    False
3    False
4     True
5    False
6    False
7    False
8     True
9     True

Note that if you want your solution to be truly case sensitive like what you're using with the .str method, you might need something more like:

rows = (df.applymap(lambda x: str(x).lower() == 'x')).any(axis=1)

The correct rows are flagged without any looping. And you get a series back that can be used for indexing the original dataframe:


df.loc[rows]
   a  b      c   d
0  0  x  False   1
1  1  b  False   x
4  4  x   True   5
8  8  i   True   x
9  9  x   True  10
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.