4

So for simplicity purposes since my data set is very large, let's say I have a dataframe:

df = pd.DataFrame([['Foo', 'Foo1'], ['Bar', 'Bar2'], ['FooBar', 'FooBar3']],
columns= ['Col_A', 'Col_B'])

I need to filter this dataframe in a way that would eliminate an entire row when a specified column row contains a partial, non case sensitive string (foo). In this case, I tried this to no avail...PS, my regex skills are trash so forgive me if it's not working for that reason.

df = df[df['Col_A'] != '^[Ff][Oo][Oo].*']

Due to the size of my dataset, efficiency is a concern which is why I have not opted for the iteration route. Thanks in advance.

1
  • @Wiktor Stribiżew the question that you marked as duplicate seems to concern filtering entire columns, rather than the content contained within the columns. Commented Aug 21, 2019 at 23:39

2 Answers 2

3

Use str.match

df[~df['Col_A'].str.match('^[Ff][Oo][Oo].*')]

result

    Col_A   Col_B
1   Bar     Bar2
Sign up to request clarification or add additional context in comments.

1 Comment

This solution is just what I needed and seems to be moldable for other situations I need to do this in. Thank you so much.
3

Another method would be too use str.startswith with str.lower and the NOT operator ~:

df[~df['Col_A'].str.lower().str.startswith('foo')]

Output

  Col_A Col_B
1   Bar  Bar2

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.