I have a dataframe with 5 columns and want to convert 2 of the columns (Chemo and Surgery) based on their values (greater than 0) to rows (diagnosis series) and add the information like the individual id and diagnosis at age to the rows.
Here is my dataframe
import pandas as pd
data = [['A-1', 'Birth', '0', '0', '0'], ['A-1', 'Lung cancer', '25', '25','25'],['A-1', 'Death', '50', '0','0'],['A-2', 'Birth', '0', '0','0'], ['A-2','Brain cancer', '12', '12','0'],['A-2', 'Skin cancer', '20','20','20'], ['A-2', 'Current age', '23', '0','0'],['A-3', 'Birth','0','0','0'], ['A-3', 'Brain cancer', '30', '0','30'], ['A-3', 'Lung cancer', '33', '33', '0'], ['A-3', 'Current age', '35', '0','0']]
df = pd.DataFrame(data, columns=["ID", "Diagnosis", "Age at Diagnosis", "Chemo", "Surgery"])
print df
I have tried to get the values where the Chemo/Surgery is greater than 0 but when I tried to add it as a row, it doesn't work.
This is what I want the end result to be.
ID Diagnosis Age at Diagnosis
0 A-1 Birth 0
1 A-1 Lung cancer 25
2 A-1 Chemo 25
3 A-1 Surgery 25
4 A-1 Death 50
5 A-2 Birth 0
6 A-2 Brain cancer 12
7 A-2 Chemo 12
8 A-2 Skin cancer 20
9 A-2 Chemo 20
10 A-2 Surgery 20
11 A-2 Current age 23
12 A-3 Birth 0
13 A-3 Brain cancer 30
14 A-3 Surgery 30
15 A-3 Lung cancer 33
16 A-3 Chemo 33
17 A-3 Current age 35
This is one of the things I have tried:
chem = "Chemo"
try_df = (df[chem] > 1)
nd = df[try_df]
df["Diagnosis"] = df[chem]
print df
ctrl + k