Reshape Pandas Dataframe with duplicate Index

Question

Current Dataframe:

CountryName      IndicatorCode    Year         Value  
Arab World     TX.VAL.MRCH.RS.ZS  1960  1.646954e+01  
Arab World     TX.VAL.MRCH.R1.ZS  1960  2.260207e+00
Arab World     TX.VAL.MRCH.RS.ZS  1961  1.244584e+01
Arab World     TX.VAL.MRCH.R1.ZS  1961  1.860104e+00  
Zimbabwe       DT.DIS.OFFT.CD     2015  8.377700e+07
Zimbabwe       DT.INT.OFFT.CD     2015  2.321300e+07
Zimbabwe       DT.AMT.PROP.CD     2015  6.250000e+05

I want to convert each value of IndicatorCode column as different columns and these columns should contain data from the respective rows of Value column.
For example, after doing reshape:

CountryName Year TX.VAL.MRCH.RS.ZS TX.VAL.MRCH.R1.ZS  
Arab World  1960 1.646954e+01      2.260207e+00
Arab World  1961 1.244584e+01      1.860104e+00

Final Dataframe columns should be:

[CountryName, Year, TX.VAL.MRCH.RS.ZS, TX.VAL.MRCH.R1.ZS, DT.DIS.OFFT.CD,DT.INT.OFFT.CD, DT.AMT.PROP.CD]

I tried using pivot, but not success. I cannot take Country name as Index also since its not unique.

temp = indicators_df.pivot(columns='IndicatorCode',  values='Value')

Got ValueError: negative dimensions are not allowed

akuiper · Accepted Answer · 2017-03-05 23:12:34Z

4

You can use pivot_table which accepts multiple columns as index, values and columns:

df.pivot_table("Value", ["CountryName", "Year"], "IndicatorCode").reset_index()

Some explanation:

The parameters passed here are by positions, i.e, they are in the order of values, index, and columns or:

df.pivot_table(values = "Value", index = ["CountryName", "Year"], columns = "IndicatorCode").reset_index()

The values are what fill the cells of the final data frame, the index are the columns that get deduplicated and remain as columns in the result, the columns variables are ones that get pivoted to column headers in the result.

edited Mar 5, 2017 at 23:12

answered Mar 5, 2017 at 22:50

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

timekeeper Over a year ago

If possible, can you tell what each arguments stand for(A little explanation). I was reading about it now, though not getting it. Thanks.

piRSquared · Accepted Answer · 2017-03-05 22:52:17Z

1

set_index + unstack

s = df.set_index(['CountryName', 'Year', 'IndicatorCode']).Value
s.unstack().reset_index().rename_axis([None], 1)

answered Mar 5, 2017 at 22:52

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Reshape Pandas Dataframe with duplicate Index

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related