Delete rows in dataframe based on column values

Question

I need to rid myself of all rows with a null value in column C. Here is the code:

infile="C:\****"

df=pd.read_csv(infile)    

A   B   C   D
1   1   NaN 3
2   3   7   NaN
4   5   NaN 8
5   NaN 4   9
NaN 1   2   NaN

There are two basic methods I have attempted.

method 1: source: How to drop rows of Pandas DataFrame whose value in certain columns is NaN

df.dropna()

The result is an empty dataframe, which makes sense because there is an NaN value in every row.

df.dropna(subset=[3])

For this method I tried to play around with the subset value using both column index number and column name. The dataframe is still empty.

method 2: source: Deleting DataFrame row in Pandas based on column value

df = df[df.C.notnull()]

Still results in an empty dataframe!

What am I doing wrong?

And the second method does not return an empty dataframe. Could it be the case your first attempt emptied the dataframe? — user2285236
– user2285236, Commented Apr 25, 2016 at 19:04
I am positive the dataframe is full because I have been printing it on the line directly prior. @MaxU df.dropna(subset=["C"]) prints a full dataframe, including null values! So frustrating. — geolish
– geolish, Commented Apr 25, 2016 at 19:23

flyingmeatball · Accepted Answer · 2016-04-25 19:22:30Z

2

df = pd.DataFrame([[1,1,np.nan,3],[2,3,7,np.nan],[4,5,np.nan,8],[5,np.nan,4,9],[np.nan,1,2,np.nan]], columns = ['A','B','C','D'])
df = df[df['C'].notnull()]
df

edited Apr 25, 2016 at 19:22

answered Apr 25, 2016 at 19:09

flyingmeatball

8,0179 gold badges48 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

flyingmeatball Over a year ago

@EdChum He didn't like notnull() up above, so I gave him some variety :)

EdChum Over a year ago

It looks to me the OP got an empty dataframe due to the first incorrect operation

CodeMouse92 Over a year ago

While this code may answer the question, providing additional context regarding why and/or how it answers the question would significantly improve its long-term value. Please edit your answer to add some explanation.

MaxU - stand with Ukraine · Accepted Answer · 2016-04-25 19:26:43Z

0

It's just a prove that your method 2 works properly (at least with pandas 0.18.0):

In [100]: df
Out[100]:
     A    B    C    D
0  1.0  1.0  NaN  3.0
1  2.0  3.0  7.0  NaN
2  4.0  5.0  NaN  8.0
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN

In [101]: df.dropna(subset=['C'])
Out[101]:
     A    B    C    D
1  2.0  3.0  7.0  NaN
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN

In [102]: df[df.C.notnull()]
Out[102]:
     A    B    C    D
1  2.0  3.0  7.0  NaN
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN

In [103]: df = df[df.C.notnull()]

In [104]: df
Out[104]:
     A    B    C    D
1  2.0  3.0  7.0  NaN
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN

answered Apr 25, 2016 at 19:26

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

2 Comments

geolish Over a year ago

Ok, so the difference must have to do with my dataset. Or is it possible my NaN values are not actually recognized as null? They were generated using pandas.merge.

MaxU - stand with Ukraine Over a year ago

@geolish, just print df.isnull() - you should see True values in the cells containing NaN's

Collectives™ on Stack Overflow

Delete rows in dataframe based on column values

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related