17

I have a dataframe and 2 lists.

the 1st list gives a set of index values from the dataframe I want to replace

the 2nd list gives the values I want to use

I don't want to touch any of the other values

Here is the dataframe:

df =  pd.DataFrame.from_dict({u'Afghanistan': 6532.0,
 u'Albania': 662.0,
 u'Andorra': 2.0,
 u'Angola': 2219.0,
 u'Antigua and Barbuda': 0.0,
 u'Argentina': 6.0,
 u'Armenia': 15.0,
 u'Australia': 108.0,
 u'Azerbaijan': 210.0,
 u'Bahamas': 0.0,
 u'Bahrain': 6.0,
 u'Bangladesh': 5098.0,
 u'Barbados': 0.0,
 u'Belarus': 21.0,
 u'Belize': 0.0,
 u'Benin': 4244.0,
 u'Bhutan': 418.0,
 u'Bolivia (Plurinational State of)': 122.0,
 u'Bosnia and Herzegovina': 43.0,
 u'Botswana': 2672.0,
 u'Brazil': 36.0,
 u'Brunei Darussalam': 42.0,
 u'Bulgaria': 46.0,
 u'Burkina Faso': 6074.0,
 u'Burundi': 18363.0,
 u'Cabo Verde': 2.0,
 u'Cambodia': 12237.0,
 u'Cameroon': 14629.0,
 u'Canada': 206.0,
 u'Central African Republic': 3207.0,
 u'Chad': 3546.0,
 u'Chile': 0.0,
 u'China': 71093.0,
 u'Colombia': 1.0,
 u'Congo': 1678.0,
 u'Cook Islands': 2.0,
 u'Costa Rica': 0.0,
 u'Croatia': 9.0,
 u'Cuba': 0.0,
 u'Cyprus': 0.0,
 u'Czechia': 9.0,
 u"C\xf4te d'Ivoire": 5729.0,
 u'Democratic Republic of the Congo': 8282.0,
 u'Denmark': 14.0,
 u'Djibouti': 183.0,
 u'Dominica': 0.0,
 u'Dominican Republic': 253.0,
 u'Ecuador': 0.0,
 u'Egypt': 2633.0,
 u'El Salvador': 0.0,
 u'Eritrea': 789.0,
 u'Estonia': 9.0,
 u'Ethiopia': 1660.0,
 u'France': 10000.0,
 u'Gabon': 15.0,
 u'Gambia': 336.0,
 u'Georgia': 50.0,
 u'Ghana': 23068.0,
 u'Greece': 56.0,
 u'Grenada': 0.0,
 u'Guatemala': 0.0,
 u'Guinea': 11294.0,
 u'Guyana': 0.0,
 u'Haiti': 992.0,
 u'Honduras': 0.0,
 u'Hungary': 1.0,
 u'Iceland': 0.0,
 u'India': 38835.0,
 u'Indonesia': 3344.0,
 u'Iran (Islamic Republic of)': 11874.0,
 u'Iraq': 726.0,
 u'Israel': 36.0,
 u'Italy': 1457.0,
 u'Jamaica': 0.0,
 u'Japan': 22497.0,
 u'Jordan': 32.0,
 u'Kazakhstan': 245.0,
 u'Kenya': 21002.0,
 u'Kiribati': 0.0,
 u'Kuwait': 6.0,
 u'Kyrgyzstan': 16.0,
 u"Lao People's Democratic Republic": 332.0,
 u'Latvia': 0.0,
 u'Lebanon': 5.0,
 u'Lesotho': 660.0,
 u'Liberia': 5977.0,
 u'Lithuania': 19.0,
 u'Luxembourg': 0.0,
 u'Madagascar': 35256.0,
 u'Malawi': 304.0,
 u'Malaysia': 6187.0,
 u'Maldives': 20.0,
 u'Mali': 1578.0,
 u'Malta': 2.0,
 u'Marshall Islands': 0.0,
 u'Mauritius': 0.0,
 u'Mexico': 30.0,
 u'Micronesia (Federated States of)': 0.0,
 u'Mongolia': 925.0,
 u'Morocco': 7368.0,
 u'Mozambique': 7375.0,
 u'Myanmar': 845.0,
 u'Namibia': 469.0,
 u'Nauru': 0.0,
 u'Nepal': 9397.0,
 u'Netherlands': 1019.0,
 u'New Zealand': 65.0,
 u'Nicaragua': 0.0,
 u'Niger': 21319.0,
 u'Nigeria': 212183.0,
 u'Niue': 0.0,
 u'Norway': 0.0,
 u'Oman': 15.0,
 u'Pakistan': 2064.0,
 u'Palau': 0.0,
 u'Panama': 0.0,
 u'Papua New Guinea': 7135.0,
 u'Paraguay': 0.0,
 u'Peru': 1.0,
 u'Philippines': 7120.0,
 u'Poland': 77.0,
 u'Portugal': 45.0,
 u'Qatar': 46.0,
 u'Republic of Korea': 32647.0,
 u'Republic of Moldova': 687.0,
 u'Romania': 35.0,
 u'Russian Federation': 4800.0,
 u'Rwanda': 2095.0,
 u'Saint Kitts and Nevis': 0.0,
 u'Saint Lucia': 0.0,
 u'Saint Vincent and the Grenadines': 0.0,
 u'San Marino': 1.0,
 u'Sao Tome and Principe': 0.0,
 u'Senegal': 5839.0,
 u'Serbia': 38.0,
 u'Sierra Leone': 3575.0,
 u'Singapore': 141.0,
 u'Slovakia': 0.0,
 u'Somalia': 3965.0,
 u'South Africa': 1459.0,
 u'Spain': 152.0,
 u'Sri Lanka': 16527.0,
 u'Sudan': 2875.0,
 u'Suriname': 0.0,
 u'Swaziland': 10.0,
 u'Sweden': 59.0,
 u'Syrian Arab Republic': 146.0,
 u'Tajikistan': 192.0,
 u'Thailand': 4074.0,
 u'The former Yugoslav republic of Macedonia': 36.0,
 u'Togo': 3578.0,
 u'Tonga': 0.0,
 u'Trinidad and Tobago': 0.0,
 u'Tunisia': 47.0,
 u'Turkey': 16244.0,
 u'Turkmenistan': 113.0,
 u'Uganda': 42554.0,
 u'Ukraine': 817.0,
 u'United Arab Emirates': 69.0,
 u'United Kingdom of Great Britain and Northern Ireland': 104.0,
 u'United Republic of Tanzania': 14649.0,
 u'United States of America': 85.0,
 u'Uruguay': 0.0,
 u'Uzbekistan': 80.0,
 u'Vanuatu': 9.0,
 u'Venezuela (Bolivarian Republic of)': 22.0,
 u'Viet Nam': 16512.0,
 u'Zambia': 30930.0,
 u'Zimbabwe': 1483.0}, orient = 'index')

Here is the 1st list:

list1 = [u'Bolivia (Plurinational State of)', u'Brunei Darussalam', u'Cabo Verde', u'China',
    u'Congo', u'Cook Islands', u'Czechia', u"C\xf4te d'Ivoire", 
    u"Democratic People's Republic of Korea", u'France', u'Iran (Islamic Republic of)', 
    u"Lao People's Democratic Republic", u'Micronesia (Federated States of)', u'Niue', 
    u'Republic of Korea', u'Republic of Moldova', u'Russian Federation', u'Sao Tome and Principe', 
    u'Serbia', u'Somalia', u'Syrian Arab Republic', u'The former Yugoslav republic of Macedonia', 
    u'United Kingdom of Great Britain and Northern Ireland', u'United Republic of Tanzania', 
    u'United States of America', u'Venezuela (Bolivarian Republic of)', u'Viet Nam']

Here is the 2nd list

list2 = [u'Bolivia', u'Brunei', u'Cape Verde', u'China[1]', u'Democratic Republic of the Congo', 
    u'Cook Islands (NZ)', u'Czech Republic', u'Ivory Coast', u'North Korea', u'France[2]', 
    u'Iran', u'Laos', u'Federated States of Micronesia', u'Niue (NZ)', u'South Korea', 
    u'Moldova[3]', u'Russia', u'S\xe3o Tom\xe9 and Pr\xedncipe', u'Serbia[5]', 
    u'Somalia[6]', u'Syria', u'Macedonia', u'United Kingdom', u'Tanzania', 
    u'United States', u'Venezuela', u'Vietnam']

This is clearly the sort of thing python excels at - and I suspect a simple for loop will do it but I can't quite wrap my head around the logic (yet)

Any help gratefully appreciated!

3
  • Not sure what has to be replaced where? Commented Apr 21, 2018 at 2:22
  • You could try to use the replace function in pandas. stackoverflow.com/questions/27060098/… Commented Apr 21, 2018 at 2:25
  • In the dataframe, some of the index values are not what I want. The 1st list identifies which index values I want to change, the second list identifies the values I want to change them to. Same number of items in each list - and their positions match. Commented Apr 21, 2018 at 2:32

4 Answers 4

25

Use,

df = df.rename(index=dict(zip(list1,list2)))
Sign up to request clarification or add additional context in comments.

1 Comment

Just brilliant! Exactly what I wanted to do - and no looping involved at all! So Easy when you know how! THANK YOU
9

zip the two lists to create a dictionary that maps old names to the new names.

use the function pandas.DataFrame.rename with with the replacements dictionary and all other default arguments

replacements = {l1:l2 for l1, l2 in zip(list1, list2)}

df2 = df.rename(replacements)

Comments

1

I believe there's an easier way now: pandas.DataFrame.set_index()

Usage:

df.set_index(list1)

OR

# Use this if you wanna assing one of the existing DataFrame columns as Index
df.set_index(df_column_id)

Comments

1

If the new labels are in a list:

  • convert the list into an array
  • use df.set_index(array)

If the new labels are in a column:

  • use df.set_index(column_label)

If the index is a MultiIndex, use respectively a 2D array and a list of labels.

import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]],
                  index=list('abc'),
                  columns=list('AB')
new_labels = np.array(list('uvw'))
df = df.set_index(new_labels)

   A  B
a  1  2
b  3  4
c  5  6

   A  B
u  1  2
v  3  4
w  5  6

If you want to replace labels according to some correspondence:

  • prepare a mapping, e.g. a dictionary which keys are the old labels, and values the corresponding new labels ({'a': 'u', 'b': 'v', 'c': 'w'})
  • use df.rename(mapping)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.