Data is of income of adults from census data, rows look like:
31, Private, 84154, Some-college, 10, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 38, NaN, >50K
48, Self-emp-not-inc, 265477, Assoc-acdm, 12, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 40, United-States, <=50K
I'm trying to remove all rows with NaNs from a DataFrame loaded from a CSV file in pandas.
>>> import pandas as pd
>>> income = pd.read_csv('income.data')
>>> income['type'].unique()
array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov,
NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object)
>>> income.dropna(how='any') # should drop all rows with NaNs
>>> income['type'].unique()
array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov,
NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object)
Self-emp-inc, nan], dtype=object) # what??
>>> income = income.dropna(how='any') # ok, maybe reassignment will work?
>>> income['type'].unique()
array([ State-gov, Self-emp-not-inc, Private, Federal-gov, Local-gov,
NaN, Self-emp-inc, Without-pay, Never-worked], dtype=object) # what??
I tried with a smaller example.csv:
label,age,sex
1,43,M
-1,NaN,F
1,65,NaN
And dropna() worked just fine here for both categorical and numerical NaNs. What is going on? I'm new to Pandas, just learning the ropes.
income.dropna(how='any')to a variable and check the values on that.dropna()is not inplace by default (I think the inplace option may have been added after .12).df.dropna(thresh = 1)? More info about your data would be good..na_values=" NaN"int hthe csv-import, then thedropnaworks fine.