get non numerical rows in a column pandas python

Question

I checked this post: finding non-numeric rows in dataframe in pandas? but it doesn't really answer my question.

my sample data:

import pandas as pd


d = {
 'unit': ['UD', 'UD', 'UD', 'UD', 'UD','UD'],
 'N-D': [ 'Q1', 'Q2', 'Q3', 'Q4','Q5','Q6'],
 'num' : [ -1.48, 1.7, -6.18, 0.25, 'sum(d)', 0.25]

}
df = pd.DataFrame(d)

it looks like this:

  N-D   num   unit
0  Q1  -1.48   UD
1  Q2   1.70   UD
2  Q3  -6.18   UD
3  Q4   0.25   UD
4  Q5   sum(d) UD
5  Q6   0.25   UD

I want to filter out only the rows in column 'num' that are NON-NUMERIC. I want all of the columns for only the rows that contain non-numeric values for column 'num'.

desired output:

  N-D   num   unit
4  Q5   sum(d) UD

my attempts:

nonnumeric=df[~df.applymap(np.isreal).all(1)] #didn't work, it pulled out everything, besides i want the condition to check only column 'num'. 

nonnumeric=df['num'][~df.applymap(np.isreal).all(1)] #didn't work, it pulled out all the rows for column 'num' only.

non_num_rows = list(df[pd.to_numeric(df['num'], errors='coerce').isnull()].index) — Jagannath Banerjee
– Jagannath Banerjee, Commented Sep 29, 2018 at 14:19

pashute · Accepted Answer · 2017-11-07 23:40:35Z

27

Use boolean indexing with mask created by to_numeric + isnull
Note: This solution does not find or filter numbers saved as strings: like '1' or '22'

print (pd.to_numeric(df['num'], errors='coerce'))
0   -1.48
1    1.70
2   -6.18
3    0.25
4     NaN
5    0.25
Name: num, dtype: float64

print (pd.to_numeric(df['num'], errors='coerce').isnull())
0    False
1    False
2    False
3    False
4     True
5    False
Name: num, dtype: bool

print (df[pd.to_numeric(df['num'], errors='coerce').isnull()])
  N-D     num unit
4  Q5  sum(d)   UD

Another solution with isinstance and apply:

print (df[df['num'].apply(lambda x: isinstance(x, str))])
  N-D     num unit
4  Q5  sum(d)   UD

edited Nov 7, 2017 at 23:40

pashute

4,0534 gold badges43 silver badges68 bronze badges

answered May 23, 2017 at 16:39

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mehdi · Accepted Answer · 2020-07-13 15:40:42Z

14

Old topic, but if the numbers have been converted to 'str', type(x) == str is not working.

Instead, it's better to use isnumeric() or isdigit().

df = df[df['num'].apply(lambda x: not x.isnumeric())]

I tested all three approaches on my own dataframe with 200k+ rows, assuming numbers have been converted to 'str' by pd.read_csv().

def f1():
    df[pd.to_numeric(df['num'], errors='coerce').isnull()]

def f2():
    df[~df.num.str.match('^\-?(\d*\.?\d+|\d+\.?\d*)$')]

def f3():
    df[df['num'].apply(lambda x: not x.isnumeric())]

I got following execution times by running each function 10 times.

timeit.timeit(f1, number=10)
1.04128568888882

timeit.timeit(f2, number=10)
1.959099448888992

timeit.timeit(f3, number=10)
0.48741375999998127

Conculsion: fastest method is isnumeric(), slowest is regular expression method.

=========================================

Edit: As @set92 commented, isnumeric() works for integer only. So the fastest applicable function is pd.to_numeric() to have a universal solutions works for any type of numerical values.

It is possible to define a isfloat() function in python; but it will be slower than internal functions, especially for big DataFrames.

tmp=['4.0','4','4.5','1','test']*200000
df=pd.DataFrame(data=tmp,columns=['num'])


def f1():
    df[pd.to_numeric(df['num'], errors='coerce').isnull()]

def f2():
    df[df['num'].apply(lambda x: not isfloat(x))] 

def f3():
    df[~df.num.str.match('^\-?(\d*\.?\d+|\d+\.?\d*)$')]


print('to_numeric:',timeit.timeit(f1, number=10))
print('isfloat:',timeit.timeit(f2, number=10))
print('regular exp:',timeit.timeit(f3, number=10))

Results:

to_numeric: 8.303612694763615
isfloat: 9.972200270603594
regular exp: 11.420604273894583

edited Jul 13, 2020 at 15:40

answered Aug 5, 2019 at 9:36

Mehdi

1,19716 silver badges14 bronze badges

5 Comments

set92 Over a year ago

It shouldn't work, df['num'] is float, and isnumeric() and isdigit() only works for integers. I was searching some function but seems the only solution available is create a function def isfloat(value): try: float(value) return True except ValueError: return False and with it change your solution to df['num'].loc[df['num'].apply(lambda x: not isfloat(x))]

Mehdi Over a year ago

but pd.to_numeric() works without any problem. It's an internal function and fast. No need to define a function!

set92 Over a year ago

with pd.to_numeric I got an error the strings (ValueError: Unable to parse string "Unrated" at position 32). If you want to try the dataset was ramen ratings of kaggle. But the problem was that the feature Stars has float numbers and 3 cells with "Unrated", so I couldn't find a one-line to apply it and find how many rows were not number.

Mehdi Over a year ago

you should use pd.to_numeric(... ,errors='coerce') to skip errors

matt91t Over a year ago

only df[pd.to_numeric(df['num'], errors='coerce').isnull()] could match floats

plasmon360 · Accepted Answer · 2017-05-23 16:42:04Z

4

I used

df = df[df['num'].apply(lambda x: type(x) == str)]

and now df is

  N-D     num unit
4  Q5  sum(d)   UD

answered May 23, 2017 at 16:42

plasmon360

4,1991 gold badge21 silver badges19 bronze badges

Comments

Jan · Accepted Answer · 2017-05-23 17:59:27Z

2

Assuming these are strings, you can filter based on a regular expression match of a floating point number.

df[~df.num.str.match('^\-?(\d*\.?\d+|\d+\.?\d*)$')]

  N-D     num unit
4  Q5  sum(d)   UD

edited May 23, 2017 at 17:59

Jan

43.3k11 gold badges57 silver badges87 bronze badges

answered May 23, 2017 at 16:58

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

ravibeli · Accepted Answer · 2020-05-09 15:02:17Z

0

There are many ways to detect non-numeric values in the column of pandas DataFrame, here is one.

df[~df['num'].map(lambda x:x.isnumeric())]

answered May 9, 2020 at 15:02

ravibeli

49412 silver badges30 bronze badges

Collectives™ on Stack Overflow

get non numerical rows in a column pandas python

5 Answers 5

Comments

5 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

5 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related