0

Dear amazing hackers of the world,

I'm a newbie, and can't figure out which python/pandas function can achieve the "transformation" I want. Showing you what I have ("original") and what kind of result I want ("desired") is better than a lengthy description (I think and hope).

import pandas as pd

original DataFrame input

df_orig = pd.DataFrame()
df_orig["Treatment"] = ["C", "C", "D", "D", "C", "C", "D", "D"]
df_orig["TimePoint"] = [24, 48, 24, 48, 24, 48, 24, 48]
df_orig["AN"] = ["ALF234","ALF234","ALF234","ALF234","XYK987","XYK987","XYK987","XYK987"]
df_orig["Bincode"] = [33,33,33,33,44,44,44,44]
df_orig["BC_all"] = ["33.7","33.7","33.7","33.7","44.9","44.9","44.9","44.9"]
df_orig["RIA_avg"] = [0.202562419159333,0.281521224788666, 0.182828319454333,0.294909088002333,
                  0.105941322218833,0.247949961707,0.1267545610749,0.159711714967666]
df_orig["sum14N_avg"] = [4120031.79121666,3742633.37033333,4659315.47073666,4345668.76408666,
                     26307312.1188333,24089229.9177999,35367286.7322666,34093045.3129]

show original DataFrame

enter image description here

desired DataFrame input,

df_wanted = pd.DataFrame()
df_wanted["AN"] = ["ALF234","XYK987"]
df_wanted["Bincode"] = [33,44]
df_wanted["BC_all"] = ["33.7","44.9"]
df_wanted["C_24_RIA_avg"] = [0.202562419159333, 0.105941322218833]
df_wanted["C_48_RIA_avg"] = [0.281521224788666,0.247949961707]
df_wanted["D_24_RIA_avg"] = [0.182828319454333,0.1267545610749]
df_wanted["D_48_RIA_avg"] = [0.294909088002333, 0.159711714967666]
df_wanted["C_24_sum14N_avg"] = [4120031.791, 26307312.12]
df_wanted["C_48_sum14N_avg"] = [3742633.37, 24089229.92]
df_wanted["D_24_sum14N_avg"] = [4659315.471, 35367286.73]
df_wanted["D_48_sum14N_avg"] = [4345668.764, 34093045.31]

show desired DataFrame

enter image description here

Thank you very much for your support!!

2 Answers 2

2

I believe you want to pivot this using pd.pivot_table. See the examples on pivot tables to understand better how this works.

The following should give you what you want.

df_wanted = pd.pivot_table(
    df_orig, 
    index=['AN', 'Bincode', 'BC_all'], 
    columns=['Treatment', 'Timepoint'], 
    values=['RIA_avg', 'sum14N_avg']
)

Note that the column names will not be transformed exactly as you stated in your output, but rather there will be a hierarchical index on both the columns and rows, which should be more convenient to work with.

Getting rows/columns/values out from this format is possible by using .loc:

df_wanted.loc['XYK987', :]
df_wanted.loc[:, ('sum14N_avg')]
df_wanted.loc['ALF234', ('RIA_avg', 'C', 24)]
Sign up to request clarification or add additional context in comments.

Comments

0

Your output is not aligned properly, so this is hard to follow. But it looks like a job for df.groupby('AN').mean() or something like that. Read the docs on Group By.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.