Row merging over strings in a dataframe?

Question

I have a phone directory that stores Department, Title, Email and Extension on seperate rows, the things being in common are First and Last Name. I have combined First and Last Name as a Key, and would like to merge the rows to where you would end up with a single row with the Name, Title, Department, Email and Extension.

I have tried creating a dictionary for each key, but I have not had any luck with the actual merging. This is where I am on coding it. I had to clean the data first to get the appropriate columns.

the table looks like the following:

LastName  FirstName  Department Title   Extension Email           Key
Doe       Jane       HR         Officer 0000                      Jane Doe
Doe       Jane       HR         Officer           [email protected]  Jane Doe

df = pd.read_excel("Directory.xlsx")
df = df.drop(columns = ["group_name","editable","id","contact_type","id2","account_id","server_uuid","picture",
             "dial_prefix","name","label","id3","transfer_name","value","key","primary","label4","id5",
             "type","display","group_name6"])

df = df.rename(index = str, columns = {"last_name":"Last Name","first_name":"First Name","location":"Department",
               "title":"Title","dial":"Extension","address":"Email"})

df["Key"] = df["First Name"].map(str) + " " + df["Last Name"].map(str)

LastName FirstName Department Title   Extension Email          Key  
Doe      Jane      HR         Officer 0000      [email protected] Jane Doe

Please add real names, somestr is not making your question clear and what you try to achieve. Try to fix your question so we can help you better. — Erfan
– Erfan, Commented Apr 5, 2019 at 16:47
Thanks Erfan, I just edited, hopefully it is easier to understand. — Paul
– Paul, Commented Apr 5, 2019 at 16:52

Erfan · Accepted Answer · 2019-04-05 16:56:46Z

1

First we use DataFrame.replace to replace the whitespaces with NaN. Then use DataFrame.groupby and apply fillna with methods backfill and forwardfill to fill in your empty spaces. Finally we can use drop_duplicates to get the single row as wanted.

df['Key'] = df['FirstName'] + ' ' + df['LastName']
df.replace('', np.NaN, inplace=True)
df = df.groupby('Key').apply(lambda x: x.fillna(method='ffill').fillna(method='bfill')).drop_duplicates()

print(df)
  LastName FirstName Department    Title Extension           Email       Key
0      Doe      Jane         HR  Officer      0000  [email protected]  Jane Doe

answered Apr 5, 2019 at 16:56

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Paul Over a year ago

Thanks Erfan, that worked. I am relatively new to data science in Python and am still trying to learn the necessary packages.

Collectives™ on Stack Overflow

Row merging over strings in a dataframe?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related