removing NaN values in python pandas

Question

Data is of income of adults from census data, rows look like:

31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K
48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K

I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.

>>> import pandas as pd
>>> income = pd.read_csv('income.data')
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object)
>>> income.dropna(how='any') # should drop all rows with NaNs
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object)
    Self-emp-inc, nan], dtype=object) # what??
>>> income = income.dropna(how='any') # ok, maybe reassignment will work?
>>> income['type'].unique()
array([ State-gov,  Self-emp-not-inc,  Private,  Federal-gov,  Local-gov,
    NaN,  Self-emp-inc,  Without-pay,  Never-worked], dtype=object) # what??

I tried with a smaller example.csv:

label,age,sex
1,43,M
-1,NaN,F
1,65,NaN

And dropna() worked just fine here for both categorical and numerical NaNs. What is going on? I'm new to Pandas, just learning the ropes.

Try assigning the line income.dropna(how='any') to a variable and check the values on that. dropna() is not inplace by default (I think the inplace option may have been added after .12). — TomAugspurger
– TomAugspurger, Commented Nov 18, 2013 at 17:13
Tried df.dropna(thresh = 1) ? More info about your data would be good.. — dorvak
– dorvak, Commented Nov 18, 2013 at 17:22
I just copy-pasted your data from above into a blank csv, imported it to pandas. It looks like the "NaN" is recognized as a string with a leading whitespace " NaN". Use na_values=" NaN" int hthe csv-import, then the dropna works fine. — dorvak
– dorvak, Commented Nov 18, 2013 at 17:40

dorvak · Accepted Answer · 2013-11-18 17:49:36Z

8

As I wrote in the comment: The "NaN" has a leading whitespace (at least in the data you provided). Therefore, you need to specifiy the na_values paramter in the read_csv function.

Try this one:

df = pd.read_csv("income.csv",header=None,na_values=" NaN")

This is why your second example works, because there is no leading whitespace here.

edited Nov 18, 2013 at 17:49

answered Nov 18, 2013 at 17:43

dorvak

9,7676 gold badges37 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

lollercoaster Over a year ago

ah yep...that does it. is there a way to make pandas strip elements in CSVs? that would seem like a fairly common task (one I just expected to be built in).

dorvak Over a year ago

No, not by default i guess (in some cases, whitespace may be usefull). But you could use pd.read_csv(StringIO(data), skipinitialspace=True) (i.e. the skipinitalspace-option, see here, or you could try using " ," or a regular expression as a custom seperator.

Francis Odero · Accepted Answer · 2022-07-04 04:52:47Z

2

Drop all rows with NaN values

df2=df.dropna()
df2=df.dropna(axis=0)

Reset index after drop

df2=df.dropna().reset_index(drop=True)

Drop row that has all NaN values

df2=df.dropna(how='all')

Drop rows that has NaN values on selected columns

df2=df.dropna(subset=['length','Height'])

edited Jul 4, 2022 at 4:52

answered Mar 18, 2021 at 19:51

Francis Odero

1515 bronze badges

1 Comment

BcK Over a year ago

Try your code before posting your answer. Your code will not remove NaN values.

Collectives™ on Stack Overflow

removing NaN values in python pandas

2 Answers 2

2 Comments

Drop all rows with NaN values

Reset index after drop

Drop row that has all NaN values

Drop rows that has NaN values on selected columns

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Drop all rows with NaN values

Reset index after drop

Drop row that has all NaN values

Drop rows that has NaN values on selected columns

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related