13

I don't understand pandas DataFrame filter.

Setup

import pandas as pd

df = pd.DataFrame(
    [
        ['Hello', 'World'],
        ['Just', 'Wanted'],
        ['To', 'Say'],
        ['I\'m', 'Tired']
    ]
)

Problem

df.filter([0], regex=r'(Hel|Just)', axis=0)

I'd expect the [0] to specify the 1st column as the one to look at and axis=0 to specify filtering rows. What I get is this:

       0      1
0  Hello  World

I was expecting

       0       1
0  Hello   World
1   Just  Wanted

Question

  • What would have gotten me what I expected?

3 Answers 3

18

Per the docs,

Arguments are mutually exclusive, but this is not checked for

So, it appears, the first optional argument, items=[0] trumps the third optional argument, regex=r'(Hel|Just)'.

In [194]: df.filter([0], regex=r'(Hel|Just)', axis=0)
Out[194]: 
       0      1
0  Hello  World

is equivalent to

In [201]: df.filter([0], axis=0)
Out[201]: 
       0      1
0  Hello  World

which is merely selecting the row(s) with index values in [0] along the 0-axis.


To get the desired result, you could use str.contains to create a boolean mask, and use df.loc to select rows:

In [210]: df.loc[df.iloc[:,0].str.contains(r'(Hel|Just)')]
Out[210]: 
       0       1
0  Hello   World
1   Just  Wanted
Sign up to request clarification or add additional context in comments.

Comments

12

This should work:

df[df[0].str.contains('(Hel|Just)', regex=True)]

Comments

1

Here is a chaining method:

df.loc[lambda x: x['column_name'].str.contains(regex_patern, regex = True)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.