10

If I have a dataframe and want to drop any rows where the value in one column is not an integer how would I do this?

The alternative is to drop rows if value is not within a range 0-2 but since I am not sure how to do either of them I was hoping someonelse might.

Here is what I tried but it didn't work not sure why:

df = df[(df['entrytype'] != 0) | (df['entrytype'] !=1) | (df['entrytype'] != 2)].all(1)
4
  • Well that won't work because of operator precedence so you need braces so it should be :df = df[(df['entrytype'] != 0) | (df['entrytype'] !=1) | (df['entrytype'] != 2)].all(1) however, if you have any rows in a column that is not numeric then the dtype will object could you not just test this Commented Feb 13, 2015 at 13:05
  • Yes I did test this so I was looking for an alternative, due to the dtype issue. What are the alternatives? Commented Feb 13, 2015 at 13:28
  • 1
    You could do df[~df['entrytype'].isin([0,1,2])] this willl filter the rows that are not 0, 1 or 2 if you are expecting the values to only be those values Commented Feb 13, 2015 at 13:34
  • 1
    Another way could be: df['entrytype'].apply(lambda x: str(x).isdigit()) Commented Feb 13, 2015 at 13:36

3 Answers 3

20

There are 2 approaches I propose:

In [212]:

df = pd.DataFrame({'entrytype':[0,1,np.NaN, 'asdas',2]})
df
Out[212]:
  entrytype
0         0
1         1
2       NaN
3     asdas
4         2

If the range of values is as restricted as you say then using isin will be the fastest method:

In [216]:

df[df['entrytype'].isin([0,1,2])]
Out[216]:
  entrytype
0         0
1         1
4         2

Otherwise we could cast to a str and then call .isdigit()

In [215]:

df[df['entrytype'].apply(lambda x: str(x).isdigit())]
Out[215]:
  entrytype
0         0
1         1
4         2
Sign up to request clarification or add additional context in comments.

4 Comments

Hi, both methods are good but unfortunately only the second slower method works for me. must be because the value are specified as a string when imported in from csv
If loading from csv, if you don't specify the dtype or try to coerce the dtype then it tries to guess, if you have non numeric values then it probably is changing them to str types, what are the errant values in your rows? It may be quicker to do df.convert_objects(convert_numeric=True) and then call df.dropna()
ok I did this and it worked also: df2 = df[df['entrytype'].isin(['0','1','2'])] but your way is cleaner i think
Ideally the dtypes should be set to the correct type, I would try to change to int if possible, however if you missing values then this can't be done as NaN cannot be represent by ints but can be represented by floats
2

We have multiple ways to do the same, but I found this method easy and efficient.

Quick Examples

#Using drop() to delete rows based on column value
df.drop(df[df['Fee'] >= 24000].index, inplace = True)

# Remove rows
df2 = df[df.Fee >= 24000]

# If you have space in column name
# Specify column name with in single quotes
df2 = df[df['column name']]

# Using loc
df2 = df.loc[df["Fee"] >= 24000 ]

# Delect rows based on multiple column value
df2 = df[ (df['Fee'] >= 22000) & (df['Discount'] == 2300)]

# Drop rows with None/NaN
df2 = df[df.Discount.notnull()]

Comments

1

str("-1").isdigit() is False

str("-1").lstrip("-").isdigit() works but is not nice.


df.loc[df['Feature'].str.match('^[+-]?\d+$')]

for your question the reverse set

df.loc[ ~(df['Feature'].str.match('^[+-]?\d+$')) ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.