0

I am trying the take values from a data frame and make it into another. It is hard to explain what I am doing but I have made an example below. Please can someone help as I have lots of columns of I would like to reduce to a few. I want to end up with matrix pd.concat([df1,df2]). from df. Pre is a factor with 2 levels, 0forPOST or 1 for PRE, SPC is a factor with many levels. Thank you

df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
                   'Rate': {0: 100, 1: 500, 2: 200},
                   'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
                   'POST_SPC2': {0:10, 1:50, 2:80},
                   'POST_SPC3': {0:30, 1:40, 2:10}})


df1 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C3'},
                    'Rate': {0: 100, 1: 500, 2: 200},
                    'PRE': {0: 1, 1: 1, 2: 1},
                    'SPC': {0:1, 1:1, 2:1},
                    'Damage': {0:'NaN', 1:'NaN', 2:'NaN'}})

df2 = pd.DataFrame({'CPDID':{0: 'C1', 1: 'C2', 2: 'C2'},
                    'Rate': {0: 100, 1: 500, 2: 200},
                    'PRE': {0: 0, 1: 0, 2: 0},
                    'SPC': {0:2, 1:2, 2:2},
                    'Damage': {0:10, 1:50, 2:80}})

print(df)
print(pd.concat([df1,df2]))

print(df)
print(pd.concat([df1,df2]))

1 Answer 1

1

The core step is to transform the dataframe by .stack(). However, your desired dataframe requires quite a few steps to transform and extract column label values from the base df, as follows:

df = pd.DataFrame({'CPDID': {0: 'C1', 1: 'C2', 2: 'C3'},
                   'Rate': {0: 100, 1: 500, 2: 200},
                   'PRE_SPC1': {0: 'NaN', 1: 'NaN', 2: 'NaN'},
                   'POST_SPC2': {0:10, 1:50, 2:80},
                   'POST_SPC3': {0:30, 1:40, 2:10}})

df_out = df.set_index(['CPDID', 'Rate'])

# split 'PRE'/'POST' from 'SPCn' from column labels
df_out.columns = df_out.columns.str.split('_', expand=True)

# prepare for column name `PRE', 'SPC' for the related columns
df_out = df_out.rename_axis(('PRE', 'SPC'),  axis=1)

# main step to transform df by stacking and name the values as 'Damage'
df_out = df_out.stack(level=[0,1]).reset_index(name='Damage')

# transform the values of 'PRE'
df_out['PRE'] = df_out['PRE'].eq('PRE').astype(int)

# extract number from 'SPCn'
df_out['SPC'] = df_out['SPC'].str.extract(r'(\d)$')

# sort to the required sequence
df_out = df_out.sort_values('SPC', ignore_index=True)

Result:

print(df_out)

  CPDID  Rate  PRE SPC Damage
0    C1   100    1   1    NaN
1    C2   500    1   1    NaN
2    C3   200    1   1    NaN
3    C1   100    0   2   10.0
4    C2   500    0   2   50.0
5    C3   200    0   2   80.0
6    C1   100    0   3   30.0
7    C2   500    0   3   40.0
8    C3   200    0   3   10.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.