Replacing DataFrame index values in python with conditional argument

Question

I'm trying to replace some string values in an index column in a pandas data frame. The indexes are country names, and I want to replace strings like 'United Kingdom of England and Northern Ireland' with 'UK'.

The data framelooks like this:

data = ['12','13','14', '15']
df = pd.DataFrame(data, index = ['Republic of Korea','United States of America20', 'United Kingdom of Great Britain and Northern Ireland19','China, Hong Kong Special Administrative Region'],columns=['Country'])

I have tried:

d={"Republic of Korea": "South Korea",
   "United States of America20": "United States",
    "United Kingdom of Great Britain and Northern Ireland19": "United Kingdom",
    "China, Hong Kong Special Administrative Region": "Hong Kong"}  
df.index = df.index.str.replace(d)

Unfortunately, I just get an error message that replace is missing a positional argument.

jezrael · Accepted Answer · 2017-11-21 06:23:29Z

2

In pandas for replace values in index or columns is used function rename:

df = df.rename(d)
print (df)
               Country
South Korea         12
United States       13
United Kingdom      14
Hong Kong           15

For me timings are practically same:

df = pd.concat([df] * 100000)

In [11]: %timeit df.rename(d)
10 loops, best of 3: 75.7 ms per loop

In [12]: %timeit pd.Series(df.index).replace(d)
10 loops, best of 3: 71.8 ms per loop

In [13]: %timeit pd.Series(df.index.values).replace(d)
10 loops, best of 3: 75.3 ms per loop

edited Nov 21, 2017 at 6:23

answered Nov 21, 2017 at 6:08

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cs95 Over a year ago

Can you please add pd.Series(df.index.values).replace(d) to your timeit list as well?

jezrael Over a year ago

Sure, no problem. Done.

cs95 · Accepted Answer · 2017-11-21 06:22:17Z

1

You could initialise a series and call pd.Series.replace:

df   
                                                   Country
Republic of Korea                                       12
United States of America20                              13
United Kingdom of Great Britain and Northern Ir...      14
China, Hong Kong Special Administrative Region          15


df.index = pd.Series(df.index).replace(d)
df

               Country
South Korea         12
United States       13
United Kingdom      14
Hong Kong           15

Timings

df = pd.concat([df] * 100000)

%timeit df.rename(d)
10 loops, best of 3: 116 ms per loop

%timeit pd.Series(df.index).replace(d)
10 loops, best of 3: 96.7 ms per loop

I can squeeze out more speed using df.index.values:

%timeit pd.Series(df.index.values).replace(d)
10 loops, best of 3: 88 ms per loop

Timings will vary on your machine, so be sure to do your own tests before deciding what method to go with.

edited Nov 21, 2017 at 6:22

answered Nov 21, 2017 at 6:06

cs95

406k106 gold badges744 silver badges797 bronze badges

3 Comments

jezrael Over a year ago

Hmmm, what is your pandas version? I test timings too and it is very similar. I use 0.21.0 under win7 with python3.

cs95 Over a year ago

@jezrael 0.21 on python3.4 (Ipython5), MacOS. My machine is a bit old so timings would vary.

Louis Bernstone Over a year ago

Thanks for your help. Great to see all the alternatives

Collectives™ on Stack Overflow

Replacing DataFrame index values in python with conditional argument

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related