Replacing missing values & updating old values in a dataframe using Numpy and Pandas

Question

I'm trying to replace missing values reflected by '...' in my dataframe with np.nan values. I also want to update some old values, but my method seems not working.

Here is my code:

import numpy as np 
import pandas as pd 


def func():
    energy=pd.ExcelFile('Energy Indicators.xls').parse('Energy')
    energy=energy.iloc[16:][['Environmental Indicators: Energy','Unnamed: 3','Unnamed: 4','Unnamed: 5']].copy()
    energy.columns=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
    o="..."
    n=np.NaN

    # Trying to replace missing values with np.nan values 
    energy[energy['Energy Supply']==o]=n


    energy['Energy Supply']=energy['Energy Supply']*1000000


    # Here, I want to replace old values by new ones ==> Same problem 
    old=["Republic of Korea","United States of America","United Kingdom of " 
                                +"Great Britain and Northern Ireland","China, Hong "
                                +"Kong Special Administrative Region"]
    new=["South Korea","United States","United Kingdom","Hong Kong"]
    for i in range(0,4):


        energy[energy['Country']==old[i],'Country']=new[i]


    return energy

Here is the .xls file I'm working on: https://drive.google.com/file/d/0B80lepon1RrYeDRNQVFWYVVENHM/view?usp=sharing

cs95 · Accepted Answer · 2017-10-22 00:12:38Z

1

I'd do this with regex based df.replace:

energy = energy.replace(r'\s*\.+\s*', np.nan, regex=True)

MaxU proposed an alternative that would work if your cells did not contain any special/whitespace characters besides the dots.

energy = energy.replace('...', np.nan, regex=False)

edited Oct 22, 2017 at 0:12

answered Oct 21, 2017 at 23:34

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

MaxU - stand with Ukraine Over a year ago

I think it should be energy = energy.replace('...', np.nan, regex=False)

cs95 Over a year ago

@MaxU regex is False by default, meaning there was something wrong with the column values (possibly leading whitespaces), and so I decided to go with regex. Will add yours in as well!

sali333 Over a year ago

energy = energy.replace('...', np.nan) Works well

Collectives™ on Stack Overflow

Replacing missing values & updating old values in a dataframe using Numpy and Pandas

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related