Filter dataframe based on string within column [duplicate]

Question

So for simplicity purposes since my data set is very large, let's say I have a dataframe:

df = pd.DataFrame([['Foo', 'Foo1'], ['Bar', 'Bar2'], ['FooBar', 'FooBar3']],
columns= ['Col_A', 'Col_B'])

I need to filter this dataframe in a way that would eliminate an entire row when a specified column row contains a partial, non case sensitive string (foo). In this case, I tried this to no avail...PS, my regex skills are trash so forgive me if it's not working for that reason.

df = df[df['Col_A'] != '^[Ff][Oo][Oo].*']

Due to the size of my dataset, efficiency is a concern which is why I have not opted for the iteration route. Thanks in advance.

@Wiktor Stribiżew the question that you marked as duplicate seems to concern filtering entire columns, rather than the content contained within the columns. — Trace R.
– Trace R., Commented Aug 21, 2019 at 23:39

pythonic833 · Accepted Answer · 2019-08-21 22:31:54Z

3

Use str.match

df[~df['Col_A'].str.match('^[Ff][Oo][Oo].*')]

result

    Col_A   Col_B
1   Bar     Bar2

answered Aug 21, 2019 at 22:31

pythonic833

3,2341 gold badge16 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Trace R. Over a year ago

This solution is just what I needed and seems to be moldable for other situations I need to do this in. Thank you so much.

Erfan · Accepted Answer · 2019-08-21 22:37:59Z

3

Another method would be too use str.startswith with str.lower and the NOT operator ~:

df[~df['Col_A'].str.lower().str.startswith('foo')]

Output

  Col_A Col_B
1   Bar  Bar2

answered Aug 21, 2019 at 22:37

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Collectives™ on Stack Overflow

Filter dataframe based on string within column [duplicate]

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related