3

To help illustrate what I want to achieve here is a DataFrame called df:

column1  column2  
1        foo faa
2        bar car
3        dog dog
4        cat rat
5        foo foo
6        bar cat
7        bird rat
8        cat dog
9        bird foo
10       bar car

I want to subset the DataFrame - the condition being that rows are dropped if a string in column2 contains one of multiple values.

This is easy enough for a single value, in this instance 'foo':

df = df[~df['column2'].str.contains("foo")]

But let's say I wanted to drop all rows in which the strings in column2 contained 'cat' or 'foo'. As applied to df above, this would drop 5 rows.

What would be the most efficient, most pythonic way to do this? This could either in the form of a function, multiple booleans or something else I'm not thinking of.

isin doesn't work as it requires exact matches.

N.B: I have edited this question as I made a mistake with it the first time round. Apologies.

2
  • The next time consider posting a new question, since the original problem was well fixed by @EdChum answer. Commented Jan 17, 2016 at 10:43
  • I've learnt some valuable lessons from you and @EdChum with this question. I won't make the same mistakes again. Thanks. Commented Jan 17, 2016 at 11:04

2 Answers 2

6

Use isin to test for membership of a list of values and negate ~ the boolean mask:

In [3]:
vals = ['bird','cat','foo']

df[~df['column2'].isin(vals)]
Out[3]:
   column1 column2
1        2     bar
2        3     dog
5        6     bar
9       10     bar

In [4]:
df['column2'].isin(vals)

Out[4]:
0     True
1    False
2    False
3     True
4     True
5    False
6     True
7     True
8     True
9    False
Name: column2, dtype: bool
Sign up to request clarification or add additional context in comments.

3 Comments

I made a mistake with the initial question. You answered it correctly but only latterly did I realise isin requires exact matches. I've edited the question to reflect that I need to be able to select strings containing specific values and not exact matches. I think this will almost certainly necessitate using str.contains() somehow. Sorry.
It's really annoying when people post questions that don't detail exactly what they want, in this case Fabio's answer is what you want, in the future you need to state your exact requirements and post representative data and desired output. Changing the question to this is what I really want is incredibly annoying and waste peoples' time, in certain situations it may be better to post another question
I'm very sorry I have annoyed you and wasted your time Ed. Lesson learnt.
5

You can use a logical masking as:

df = df[(~df['column2'].str.contains("foo")) & (~df['column2'].str.contains("bird")) & (~df['column2'].str.contains("cat"))]

that returns:

   column1 column2
1        2     bar
2        3     dog
5        6     bar
9       10     bar

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.