Python Pandas DataFrame how to Pivot

Question

Dear amazing hackers of the world,

I'm a newbie, and can't figure out which python/pandas function can achieve the "transformation" I want. Showing you what I have ("original") and what kind of result I want ("desired") is better than a lengthy description (I think and hope).

import pandas as pd

original DataFrame input

df_orig = pd.DataFrame()
df_orig["Treatment"] = ["C", "C", "D", "D", "C", "C", "D", "D"]
df_orig["TimePoint"] = [24, 48, 24, 48, 24, 48, 24, 48]
df_orig["AN"] = ["ALF234","ALF234","ALF234","ALF234","XYK987","XYK987","XYK987","XYK987"]
df_orig["Bincode"] = [33,33,33,33,44,44,44,44]
df_orig["BC_all"] = ["33.7","33.7","33.7","33.7","44.9","44.9","44.9","44.9"]
df_orig["RIA_avg"] = [0.202562419159333,0.281521224788666, 0.182828319454333,0.294909088002333,
                  0.105941322218833,0.247949961707,0.1267545610749,0.159711714967666]
df_orig["sum14N_avg"] = [4120031.79121666,3742633.37033333,4659315.47073666,4345668.76408666,
                     26307312.1188333,24089229.9177999,35367286.7322666,34093045.3129]

show original DataFrame

enter image description here

desired DataFrame input,

df_wanted = pd.DataFrame()
df_wanted["AN"] = ["ALF234","XYK987"]
df_wanted["Bincode"] = [33,44]
df_wanted["BC_all"] = ["33.7","44.9"]
df_wanted["C_24_RIA_avg"] = [0.202562419159333, 0.105941322218833]
df_wanted["C_48_RIA_avg"] = [0.281521224788666,0.247949961707]
df_wanted["D_24_RIA_avg"] = [0.182828319454333,0.1267545610749]
df_wanted["D_48_RIA_avg"] = [0.294909088002333, 0.159711714967666]
df_wanted["C_24_sum14N_avg"] = [4120031.791, 26307312.12]
df_wanted["C_48_sum14N_avg"] = [3742633.37, 24089229.92]
df_wanted["D_24_sum14N_avg"] = [4659315.471, 35367286.73]
df_wanted["D_48_sum14N_avg"] = [4345668.764, 34093045.31]

show desired DataFrame

enter image description here

Thank you very much for your support!!

Wilduck · Accepted Answer · 2014-09-30 18:09:12Z

2

I believe you want to pivot this using pd.pivot_table. See the examples on pivot tables to understand better how this works.

The following should give you what you want.

df_wanted = pd.pivot_table(
    df_orig, 
    index=['AN', 'Bincode', 'BC_all'], 
    columns=['Treatment', 'Timepoint'], 
    values=['RIA_avg', 'sum14N_avg']
)

Note that the column names will not be transformed exactly as you stated in your output, but rather there will be a hierarchical index on both the columns and rows, which should be more convenient to work with.

Getting rows/columns/values out from this format is possible by using .loc:

df_wanted.loc['XYK987', :]
df_wanted.loc[:, ('sum14N_avg')]
df_wanted.loc['ALF234', ('RIA_avg', 'C', 24)]

edited Sep 30, 2014 at 18:09

answered Sep 30, 2014 at 17:52

Wilduck

14.2k13 gold badges63 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dan Allan · Accepted Answer · 2014-09-30 17:49:57Z

0

Your output is not aligned properly, so this is hard to follow. But it looks like a job for df.groupby('AN').mean() or something like that. Read the docs on Group By.

answered Sep 30, 2014 at 17:49

Dan Allan

35.5k6 gold badges72 silver badges64 bronze badges

Collectives™ on Stack Overflow

Python Pandas DataFrame how to Pivot

original DataFrame input

show original DataFrame

desired DataFrame input,

show desired DataFrame

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

original DataFrame input

show original DataFrame

desired DataFrame input,

show desired DataFrame

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related