1

I am trying to learn to use filter to get rows based on the below conditions.

  1. Checking if col-a contains T2 and
  2. Checking if col-b has a time stamp between 7 and 9

I thought filter is a cool way to do this with few lines of code. But I haven't been able to get the desired output which is rows that satisfy the above conditions. What are other simple pythonic ways are there to do this (maybe where?). I'd appreciate any help in understanding how Filter works.

import pandas as pd

dict = {'col-a': ['abcd.T1.123', 'xyz.T2.456', 'xyz.T2.456'],
        'col-b': ['07:57:00', '09:17:00', '12:57:00'],
        }

# Filtering based on col-a - contains T-id
original_df = pd.DataFrame(dict)
print("\n ORIGINAL DF\n", original_df)
filtered_a_df = original_df.filter(like='.T2', axis=0)
print("\n FILTERED DF\n", filtered_a_df)

# Filtering based on col-b - time between 7 and 9
filtered_b_df = original_df.filter(regex='^0[79]:', axis=0)
print("\n FILTERED DF\n", filtered_b_df)

1 Answer 1

2

From the docs:

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

From your question, it seems very much like you're trying to filter based on the contents of your dataframe. So you can use regular indexing:

filtered_a_df = original_df[original_df['col-a'].str.contains('T2')]

filtered_b_df = original_df[original_df['col-b'].between('07:00:00','09:00:00')]

>>> filtered_a_df
        col-a     col-b
1  xyz.T2.456  09:17:00
2  xyz.T2.456  12:57:00
>>> filtered_b_df
         col-a     col-b
0  abcd.T1.123  07:57:00

To further explain filter, your conditions could work if you were trying to filter based on the index. For instance, if you have df2 as the original dataframe but with col-a as your index, then you can use filter:

df2 = original_df.set_index('col-a')
>>> df2
                col-b
col-a                
abcd.T1.123  07:57:00
xyz.T2.456   09:17:00
xyz.T2.456   12:57:00

# In this case you can use either regex or like arguments
>>> df2.filter(regex='T2',axis=0)

               col-b
col-a               
xyz.T2.456  09:17:00
xyz.T2.456  12:57:00

Or you can filter columns as well. Going back to your original df, you can, for instance, filter columns that have -b in the name:

>>> original_df.filter(like='-b',axis=1)
      col-b
0  07:57:00
1  09:17:00
2  12:57:00
Sign up to request clarification or add additional context in comments.

3 Comments

But if I use axis=0, it filters based on row contents - no ?
No, not content, index. axis=0 will filter based on the index, and axis=1 will filter based on the column names. As noted in the quote I posted and the docs, this routine does not filter a dataframe on its contents
Glad I could help! It's a strange function, especially if you happen to come from a dplyr background, where filter can do what you were originally describing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.