Let's say that I have this dataframe :
Name = ['Lolo', 'Mike', 'Tobias','Luke','Sam']
Age = [19, 34, 13, 45, 52]
Info_1 = ['Tall', 'Large', 'Small', 'Small','']
Info_2 = ['New York', 'Paris', 'Lisbon', '', 'Berlin']
Info_3 = ['Tall', 'Paris', 'Hi', 'Small', 'Thanks']
Data = [123,268,76,909,87]
Sex = ['F', 'M', 'M','M','M']
df = pd.DataFrame({'Name' : Name, 'Age' : Age, 'Info_1' : Info_1, 'Info_2' : Info_2, 'Info_3' : Info_3, 'Data' : Data, 'Sex' : Sex})
print(df)
Name Age Info_1 Info_2 Info_3 Data Sex
0 Lolo 19 Tall New York Tall 123 F
1 Mike 34 Large Paris Paris 268 M
2 Tobias 13 Small Lisbon Hi 76 M
3 Luke 45 Small Small 909 M
4 Sam 52 Berlin Thanks 87 M
I want to merge the data of four columns of this dataframe : Info_1, Info_2, Info_3, Data. I want to merge them without having duplicates of data for each row. That means for the row "0", I do not want to have "Tall" twice. So at the end I would like to get something like that :
Name Age Info Sex
0 Lolo 19 Tall New York 123 F
1 Mike 34 Large Paris 268 M
2 Tobias 13 Small Lisbon Hi 76 M
3 Luke 45 Small 909 M
4 Sam 52 Berlin Thanks 87 M
I tried this function to merge the data :
di['period'] = df[['Info_1', 'Info_2', 'Info_3' 'Data']].agg('-'.join, axis=1)
However I get an error because it expects a string, How can I merge the data of the column "Data" ? And how can I check that I do not create duplicates
Thank you