Delete rows based on values in column in python

Question

I am performing data clean on a .csv file for performing analytics. I am trying delete the rows having null values in their column in python.

Sample file:

    Unnamed: 0  2012    2011    2010    2009    2008    2005
0   United States of America    760739  752423  781844  812514  843683  862220
1   Brazil  732913  717185  715702  651879  649996  NaN
2   Germany     520005  513458  515853  519010  518499  494329
3   United Kingdom (England and Wales)  310544  336997  367055  399869  419273  541455
4   Mexico  211921  212141  230687  244623  250932  239166
5   France  193081  192263  192906  193405  187937  148651
6   Sweden  87052   89457   87854   86281   84566   72645
7   Romania     17219   12299   12301   9072    9457    8898
8   Nigeria     15388   NaN     18093   14075   14692   NaN

So far used is:

from pandas import read_csv
link = "https://docs.google.com/spreadsheets......csv"
data = read_csv(link)
data.head(100000)

How can I delete these rows?

I have taken statistics for all all the countries for a condition. Would be performing analytics on this data. Before that, all the missing data that is if the value is 0 against the country for specific year, that particular country i would like to drop for analytics. Please let me know if you need more info. — Srikanth Kadithota
– Srikanth Kadithota, Commented Sep 23, 2014 at 9:19
Suppose for a row of USA, if the amount is 0 in any of the year, i would like to drop USA. USA should not be present in the output. If for UK, all the values are not 0, then i will not be dropping that row. UK will be still there in the output — Srikanth Kadithota
– Srikanth Kadithota, Commented Sep 23, 2014 at 9:23
Follow this stackoverflow.com/questions/21468582/… It will help. — Anup
– Anup, Commented Sep 23, 2014 at 9:41

John Zwinck · Accepted Answer · 2014-09-23 09:02:49Z

0

Once you have your data loaded you just need to figure out which rows to remove:

bad_rows = np.any(np.isnan(data), axis=1)

Then:

data[~bad_rows].head(100)

answered Sep 23, 2014 at 9:02

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ewan · Accepted Answer · 2014-09-23 09:10:21Z

0

You need to use the dropna method to remove these values. Passing in how='any' into the method as an argument will remove the row if any of the values is null and how='all' will only remove the row if all of the values are null.

cleaned_data = data.dropna(how='any')

Edit 1.

It's worth noting that you may not want to have to create a copy of your cleaned data. (i.e. cleaned_data = data.dropna(how='any').

To save memory you can pass in the inplace option that will modify your original DataFrame and return None.

data.dropna(how='any', inplace=True)
data.head(100)

edited Sep 23, 2014 at 9:10

answered Sep 23, 2014 at 9:03

Ewan

15.1k6 gold badges50 silver badges65 bronze badges

1 Comment

Srikanth Kadithota Over a year ago

Here the number of rows will be dropped. But i need to drop rows based on the value. Suppose for a row of USA, if the amount is 0 in any of the year, i would like to drop USA. If for UK, all the values are not 0, then i will not be dropping that row. UK will be still there in the output.

Collectives™ on Stack Overflow

Delete rows based on values in column in python

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related