0

I am performing data clean on a .csv file for performing analytics. I am trying delete the rows having null values in their column in python.

Sample file:

    Unnamed: 0  2012    2011    2010    2009    2008    2005
0   United States of America    760739  752423  781844  812514  843683  862220
1   Brazil  732913  717185  715702  651879  649996  NaN
2   Germany     520005  513458  515853  519010  518499  494329
3   United Kingdom (England and Wales)  310544  336997  367055  399869  419273  541455
4   Mexico  211921  212141  230687  244623  250932  239166
5   France  193081  192263  192906  193405  187937  148651
6   Sweden  87052   89457   87854   86281   84566   72645
7   Romania     17219   12299   12301   9072    9457    8898
8   Nigeria     15388   NaN     18093   14075   14692   NaN

So far used is:

from pandas import read_csv
link = "https://docs.google.com/spreadsheets......csv"
data = read_csv(link)
data.head(100000)

How can I delete these rows?

4
  • 1
    Can you explain exactly what you are trying to do? Commented Sep 23, 2014 at 9:01
  • I have taken statistics for all all the countries for a condition. Would be performing analytics on this data. Before that, all the missing data that is if the value is 0 against the country for specific year, that particular country i would like to drop for analytics. Please let me know if you need more info. Commented Sep 23, 2014 at 9:19
  • Suppose for a row of USA, if the amount is 0 in any of the year, i would like to drop USA. USA should not be present in the output. If for UK, all the values are not 0, then i will not be dropping that row. UK will be still there in the output Commented Sep 23, 2014 at 9:23
  • Follow this stackoverflow.com/questions/21468582/… It will help. Commented Sep 23, 2014 at 9:41

2 Answers 2

0

Once you have your data loaded you just need to figure out which rows to remove:

bad_rows = np.any(np.isnan(data), axis=1)

Then:

data[~bad_rows].head(100)
Sign up to request clarification or add additional context in comments.

Comments

0

You need to use the dropna method to remove these values. Passing in how='any' into the method as an argument will remove the row if any of the values is null and how='all' will only remove the row if all of the values are null.

cleaned_data = data.dropna(how='any')

Edit 1.

It's worth noting that you may not want to have to create a copy of your cleaned data. (i.e. cleaned_data = data.dropna(how='any').

To save memory you can pass in the inplace option that will modify your original DataFrame and return None.

data.dropna(how='any', inplace=True)
data.head(100)

1 Comment

Here the number of rows will be dropped. But i need to drop rows based on the value. Suppose for a row of USA, if the amount is 0 in any of the year, i would like to drop USA. If for UK, all the values are not 0, then i will not be dropping that row. UK will be still there in the output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.