1

I have a dataframe energy with missing values in some column. The missing values are represented by a string ... in the dataframe. I want to replace all these values by np.NaN

In [3]: import pandas as pd

In [4]: import numpy as np

In [7]: energy = pd.read_excel('test.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ['Country', 'ES'
   ...: , 'ESC', '% Renewable'])

In [8]: energy[(energy['ES'] == "...") | (energy['ESC'] == "...")]
Out[8]: 
                          Country   ES  ESC  % Renewable
3                  American Samoa  ...  ...     0.641026
86                           Guam  ...  ...     0.000000
150      Northern Mariana Islands  ...  ...     0.000000
210                        Tuvalu  ...  ...     0.000000
217  United States Virgin Islands  ...  ...     0.000000

To replace these values, I tried:

In [9]: energy[(energy['ES'] == "...")]['ES'] = np.NaN
/usr/local/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/bin/python3

I don't understand the error and also I don't see any other way to achieve what I want to. Any ideas?

4
  • 3
    FYI, you can pass na_values='...' to pd.read_excel. Commented Dec 21, 2016 at 16:23
  • 1
    reopen because also I don't see any other way to achieve what I want to Commented Dec 21, 2016 at 16:41
  • @jezrael How is that relevant? Commented Dec 21, 2016 at 17:00
  • I am not sure, but it seem it is half duplicity. Commented Dec 21, 2016 at 17:07

2 Answers 2

1

I think you need:

energy['ES'] = energy.loc[energy['ES'] != "...", 'ES'] 

Another solution:

energy['ES'] = energy['ES'].mask(energy['ES'] == "...")

Or:

energy['ES'] = energy['ES'].replace({'...': np.nan})

But the best is ayhan comment:

you can pass na_values='...' to pd.read_excel

Sign up to request clarification or add additional context in comments.

Comments

0

If Energy is your pandas dataframe then in your case you can also try:

for col in Energy.columns:
    Energy[col] = pd.to_numeric(Energy[col], errors = 'coerce')

Above code will convert all your missing values to nan automatically for all columns in your dataframe.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.