25

I have some values in the risk column that are neither, Small, Medium or High. I want to delete the rows with the value not being Small, Medium and High. I tried the following:

df = df[(df.risk == "Small") | (df.risk == "Medium") | (df.risk == "High")]

But this returns an empty DataFrame. How can I filter them correctly?

11
  • 1
    I've tried to create a dataframe with such data, and your string of code works properly. Could you give more information about what contains in dataframe and how do you generate it? Commented Apr 27, 2014 at 14:39
  • You requirement is a little unclear, if all your values can ever be small, mediu, or high and you want to drop rows that are any of these values then this will result in now rows so could you explain clearer what you require Commented Apr 27, 2014 at 14:49
  • Hmm.. your code is correct so I think you need to post data and code that reproduces your problem Commented Apr 27, 2014 at 15:04
  • For example, it would be useful to see what df.risk.value_counts() returns. Commented Apr 27, 2014 at 15:17
  • @EdChum. Your previous (now deleted post) had df = df[df.risk.isin(['Small','Medium','High'])]. That gave the desired result ! Commented Apr 27, 2014 at 16:40

3 Answers 3

38

I think you want:

df = df[(df.risk.isin(["Small","Medium","High"]))]

Example:

In [5]:
import pandas as pd
df = pd.DataFrame({'risk':['Small','High','Medium','Negligible', 'Very High']})
df

Out[5]:

         risk
0       Small
1        High
2      Medium
3  Negligible
4   Very High

[5 rows x 1 columns]

In [6]:

df[df.risk.isin(['Small','Medium','High'])]

Out[6]:

     risk
0   Small
1    High
2  Medium

[3 rows x 1 columns]
Sign up to request clarification or add additional context in comments.

4 Comments

Aren't these logical statements equivalent to the one provided by author?
I've tryed your example with author's slicing - it works correctly
@MikhailElizarev it is slightly different but the OP's question is a little unclear as what they are doing would result in no results
To make it mor universally, one could do this instead: df[df["risk"].isin(['Small','Medium','High'])]
4

Another nice and readable approach is the following:

small_risk = df["risk"] == "Small"
medium_risk = df["risk"] == "Medium"
high_risk = df["risk"] == "High"

Then you can use it like this:

df[small_risk | medium_risk | high_risk]

or

df[small_risk & medium_risk]

2 Comments

df[small_risk & medium_risk] always returns nothing. df["risk"] cannot be simultaneously equal "Small" AND "Medium"
That example was to show that AND gate in cases it’s applicable.
2

You could also use query:

df.query('risk in ["Small","Medium","High"]')

You can refer to variables in the environment by prefixing them with @. For example:

lst = ["Small","Medium","High"]
df.query("risk in @lst")

If the column name is multiple words, e.g. "risk factor", you can refer to it by surrounding it with backticks ` `:

df.query('`risk factor` in @lst')

query method comes in handy if you need to chain multiple conditions. For example, the outcome of the following filter:

df[df['risk factor'].isin(lst) & (df['value']**2 > 2) & (df['value']**2 < 5)]

can be derived using the following expression:

df.query('`risk factor` in @lst and 2 < value**2 < 5')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.