1

Let's say that I have this dataframe :

Name = ['Lolo', 'Mike', 'Tobias','Luke','Sam']
Age = [19, 34, 13, 45, 52]
Info_1 = ['Tall', 'Large', 'Small', 'Small','']
Info_2 = ['New York', 'Paris', 'Lisbon', '', 'Berlin']
Info_3 = ['Tall', 'Paris', 'Hi', 'Small', 'Thanks']
Data = [123,268,76,909,87]
Sex = ['F', 'M', 'M','M','M']

df = pd.DataFrame({'Name' : Name, 'Age' : Age, 'Info_1' : Info_1, 'Info_2' : Info_2, 'Info_3' : Info_3, 'Data' : Data, 'Sex' : Sex})

print(df)

     Name  Age Info_1    Info_2  Info_3  Data Sex
0    Lolo   19   Tall  New York    Tall   123   F
1    Mike   34  Large     Paris   Paris   268   M
2  Tobias   13  Small    Lisbon      Hi    76   M
3    Luke   45  Small             Small   909   M
4     Sam   52           Berlin  Thanks    87   M

I want to merge the data of four columns of this dataframe : Info_1, Info_2, Info_3, Data. I want to merge them without having duplicates of data for each row. That means for the row "0", I do not want to have "Tall" twice. So at the end I would like to get something like that :

     Name  Age                Info Sex
0    Lolo   19   Tall New York 123   F
1    Mike   34     Large Paris 268   M
2  Tobias   13  Small Lisbon Hi 76   M
3    Luke   45           Small 909   M
4     Sam   52    Berlin Thanks 87   M

I tried this function to merge the data :

di['period'] = df[['Info_1', 'Info_2', 'Info_3' 'Data']].agg('-'.join, axis=1)

However I get an error because it expects a string, How can I merge the data of the column "Data" ? And how can I check that I do not create duplicates

Thank you

2 Answers 2

2

Your Data columns seems to be int type. Convert it to strings first:

df['Data'] = df['Data'].astype(str)
df['period'] = (df[['Info_1','Info_2','Info_3','Data']]
                   .apply(lambda x: ' '.join(x[x!=''].unique()), axis=1)
               )

Output:

     Name  Age Info_1    Info_2  Info_3 Data Sex              period
0    Lolo   19   Tall  New York    Tall  123   F   Tall New York 123
1    Mike   34  Large     Paris   Paris  268   M     Large Paris 268
2  Tobias   13  Small    Lisbon      Hi   76   M  Small Lisbon Hi 76
3    Luke   45  Small             Small  909   M           Small 909
4     Sam   52           Berlin  Thanks   87   M    Berlin Thanks 87
Sign up to request clarification or add additional context in comments.

Comments

1

I think it's probably easiest to first just concatenate all the fields you want with a space in between:

df['Info'] = df.Info_1 + ' ' +  df.Info_2 + ' ' + df.Info_3 + ' ' + df.Data.astype(str)

Then you can write a function to remove the duplicate words from a string, something like this:

def remove_dup_words(s):
    words = s.split(' ')
    unique_words = pd.Series(words).drop_duplicates().tolist()
    return ' '.join(unique_words)

and apply that function to the Info field:

df['Info'] = df.Info.apply(remove_dup_words)

all the code together:

import pandas as pd

def remove_dup_words(s):
    words = s.split(' ')
    unique_words = pd.Series(words).drop_duplicates().tolist()
    return ' '.join(unique_words)

Name = ['Lolo', 'Mike', 'Tobias','Luke','Sam']
Age = [19, 34, 13, 45, 52]
Info_1 = ['Tall', 'Large', 'Small', 'Small','']
Info_2 = ['New York', 'Paris', 'Lisbon', '', 'Berlin']
Info_3 = ['Tall', 'Paris', 'Hi', 'Small', 'Thanks']
Data = [123,268,76,909,87]
Sex = ['F', 'M', 'M','M','M']

df = pd.DataFrame({'Name' : Name, 'Age' : Age, 'Info_1' : Info_1, 'Info_2' : Info_2, 'Info_3' : Info_3, 'Data' : Data, 'Sex' : Sex})

df['Info'] = df.Info_1 + ' ' +  df.Info_2 + ' ' + df.Info_3 + ' ' + df.Data.astype(str)
df['Info'] = df.Info.apply(remove_dup_words)

print(df)

     Name  Age Info_1    Info_2  Info_3  Data Sex                Info
0    Lolo   19   Tall  New York    Tall   123   F   Tall New York 123
1    Mike   34  Large     Paris   Paris   268   M     Large Paris 268
2  Tobias   13  Small    Lisbon      Hi    76   M  Small Lisbon Hi 76
3    Luke   45  Small             Small   909   M          Small  909
4     Sam   52           Berlin  Thanks    87   M    Berlin Thanks 87

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.