How to restructure dataframe to convert column values to row values based on condition

Question

I have a dataframe with 5 columns and want to convert 2 of the columns (Chemo and Surgery) based on their values (greater than 0) to rows (diagnosis series) and add the information like the individual id and diagnosis at age to the rows.

Here is my dataframe

import pandas as pd

data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]

df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
print df

I have tried to get the values where the Chemo/Surgery is greater than 0 but when I tried to add it as a row, it doesn't work.

This is what I want the end result to be.

ID     Diagnosis Age at Diagnosis
0   A-1         Birth                0
1   A-1   Lung cancer               25
2   A-1         Chemo               25
3   A-1       Surgery               25
4   A-1         Death               50
5   A-2         Birth                0
6   A-2  Brain cancer               12
7   A-2         Chemo               12
8   A-2   Skin cancer               20
9   A-2         Chemo               20
10  A-2       Surgery               20
11  A-2   Current age               23
12  A-3         Birth                0
13  A-3  Brain cancer               30
14  A-3       Surgery               30
15  A-3   Lung cancer               33
16  A-3         Chemo               33
17  A-3   Current age               35

This is one of the things I have tried:

chem = "Chemo"
try_df = (df[chem] > 1)
nd = df[try_df]
df["Diagnosis"] = df[chem]
print df

Thank you for editing, @student. I was looking up how to do it. — had01
– had01, Commented Jun 4, 2019 at 20:34
I tried to get the ids where the chemo/surgery was greater than 0 and used that argument with the dataframe. — had01
– had01, Commented Jun 4, 2019 at 20:40
Can you show it? Much easier for us to help if we can see code attempts. — Will
– Will, Commented Jun 4, 2019 at 20:44

Quang Hoang · Accepted Answer · 2019-06-04 21:31:33Z

3

We can melt the two columns Chemo and Surgery, then drop all the zero and concat back:

# melt the two columns
new_df = df[['ID', 'Chemo', 'Surgery']].melt(id_vars='ID', 
                                             value_name='Age at Diagnosis',
                                             var_name='Diagnosis')
# filter out the zeros
new_df = new_df[new_df['Age at Diagnosis'].ne('0')]

# concat with the original dataframe, ignoring the extra columns
new_df = pd.concat((df,new_df), sort=False, join='inner')

# sort values
new_df.sort_values(['ID','Age at Diagnosis'])

Output:

    ID      Diagnosis   Age at Diagnosis
0   A-1     Birth           0
1   A-1     Lung cancer     25
1   A-1     Chemo           25
12  A-1     Surgery         25
2   A-1     Death           50
3   A-2     Birth           0
4   A-2     Brain cancer    12
4   A-2     Chemo           12
5   A-2     Skin cancer     20
5   A-2     Chemo           20
16  A-2     Surgery         20
6   A-2     Current age     23
7   A-3     Birth           0
8   A-3     Brain cancer    30
19  A-3     Surgery         30
9   A-3     Lung cancer     33
9   A-3     Chemo           33
10  A-3     Current age     35

answered Jun 4, 2019 at 21:31

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

had01 Over a year ago

I have a larger dataframe and have a couple of more columns like chemo and surgery that need needs to be accounted for like above. This code works on the dataset that I provided (it was an sample dataset) but it won't work on the larger one.

Will · Accepted Answer · 2019-06-04 21:17:04Z

1

This attempt is pretty verbose and takes a few steps. WE can't do a simple pivot or index/column stacking because we need to modify one column with partial results from another. This requires splitting and appending.

Firstly, convert your dataframe into dtypes we can work with.

data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])

df[["Age at Diagnosis", "Chemo", "Surgery"]] = df[["Age at Diagnosis", "Chemo", "Surgery"]].astype(int)

Now we split the thing up into bits and pieces.

# I like making a copy or resetting an index so that 
# pandas is not operating off a slice
df_chemo = df[df.Chemo > 0].copy()
df_surgery = df[df.Surgery > 0].copy()

# drop columns you don't need
df_chemo.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df_surgery.drop(["Chemo", "Surgery"], axis=1, inplace=True)
df.drop(["Chemo", "Surgery"], axis=1, inplace=True)

# Set Chemo and Surgery Diagnosis
df_chemo.Diagnosis = "Chemo"
df_surgery.Diagnosis = "Surgery"

Then append everything together. You can do this because the column dimensions match.

df_new = df.append(df_chemo).append(df_surgery)
# make it look pretty
df_new.sort_values(["ID", "Age at Diagnosis"]).reset_index(drop=True)

edited Jun 4, 2019 at 21:17

answered Jun 4, 2019 at 21:04

Will

1,55010 silver badges22 bronze badges

2 Comments

ch33hau Over a year ago

df_new.sort_values(["ID", "Age at Diagnosis"]).reset_index(drop=True)

had01 Over a year ago

Hi Will, this didn't work on my larger dataset that has about 4 columns that need to be made into rows. It doesn't work at the part where I remove the zeros.

Collectives™ on Stack Overflow

How to restructure dataframe to convert column values to row values based on condition

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related