Filtering Rows using Filter in Pandas Dataframe

Question

I am trying to learn to use filter to get rows based on the below conditions.

Checking if col-a contains T2 and
Checking if col-b has a time stamp between 7 and 9

I thought filter is a cool way to do this with few lines of code. But I haven't been able to get the desired output which is rows that satisfy the above conditions. What are other simple pythonic ways are there to do this (maybe where?). I'd appreciate any help in understanding how Filter works.

import pandas as pd

dict = {'col-a': ['abcd.T1.123', 'xyz.T2.456', 'xyz.T2.456'],
        'col-b': ['07:57:00', '09:17:00', '12:57:00'],
        }

# Filtering based on col-a - contains T-id
original_df = pd.DataFrame(dict)
print("\n ORIGINAL DF\n", original_df)
filtered_a_df = original_df.filter(like='.T2', axis=0)
print("\n FILTERED DF\n", filtered_a_df)

# Filtering based on col-b - time between 7 and 9
filtered_b_df = original_df.filter(regex='^0[79]:', axis=0)
print("\n FILTERED DF\n", filtered_b_df)

sacuL · Accepted Answer · 2018-10-25 00:24:27Z

2

From the docs:

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

From your question, it seems very much like you're trying to filter based on the contents of your dataframe. So you can use regular indexing:

filtered_a_df = original_df[original_df['col-a'].str.contains('T2')]

filtered_b_df = original_df[original_df['col-b'].between('07:00:00','09:00:00')]

>>> filtered_a_df
        col-a     col-b
1  xyz.T2.456  09:17:00
2  xyz.T2.456  12:57:00
>>> filtered_b_df
         col-a     col-b
0  abcd.T1.123  07:57:00

To further explain filter, your conditions could work if you were trying to filter based on the index. For instance, if you have df2 as the original dataframe but with col-a as your index, then you can use filter:

df2 = original_df.set_index('col-a')
>>> df2
                col-b
col-a                
abcd.T1.123  07:57:00
xyz.T2.456   09:17:00
xyz.T2.456   12:57:00

# In this case you can use either regex or like arguments
>>> df2.filter(regex='T2',axis=0)

               col-b
col-a               
xyz.T2.456  09:17:00
xyz.T2.456  12:57:00

Or you can filter columns as well. Going back to your original df, you can, for instance, filter columns that have -b in the name:

>>> original_df.filter(like='-b',axis=1)
      col-b
0  07:57:00
1  09:17:00
2  12:57:00

edited Oct 25, 2018 at 0:24

answered Oct 25, 2018 at 0:17

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Vandhana Over a year ago

But if I use axis=0, it filters based on row contents - no ?

sacuL Over a year ago

No, not content, index. axis=0 will filter based on the index, and axis=1 will filter based on the column names. As noted in the quote I posted and the docs, this routine does not filter a dataframe on its contents

sacuL Over a year ago

Glad I could help! It's a strange function, especially if you happen to come from a dplyr background, where filter can do what you were originally describing

Collectives™ on Stack Overflow

Filtering Rows using Filter in Pandas Dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related