0

I have a phone directory that stores Department, Title, Email and Extension on seperate rows, the things being in common are First and Last Name. I have combined First and Last Name as a Key, and would like to merge the rows to where you would end up with a single row with the Name, Title, Department, Email and Extension.

I have tried creating a dictionary for each key, but I have not had any luck with the actual merging. This is where I am on coding it. I had to clean the data first to get the appropriate columns.

the table looks like the following:

LastName  FirstName  Department Title   Extension Email           Key
Doe       Jane       HR         Officer 0000                      Jane Doe
Doe       Jane       HR         Officer           [email protected]  Jane Doe
df = pd.read_excel("Directory.xlsx")
df = df.drop(columns = ["group_name","editable","id","contact_type","id2","account_id","server_uuid","picture",
             "dial_prefix","name","label","id3","transfer_name","value","key","primary","label4","id5",
             "type","display","group_name6"])

df = df.rename(index = str, columns = {"last_name":"Last Name","first_name":"First Name","location":"Department",
               "title":"Title","dial":"Extension","address":"Email"})

df["Key"] = df["First Name"].map(str) + " " + df["Last Name"].map(str)
LastName FirstName Department Title   Extension Email          Key  
Doe      Jane      HR         Officer 0000      [email protected] Jane Doe
2
  • 2
    Please add real names, somestr is not making your question clear and what you try to achieve. Try to fix your question so we can help you better. Commented Apr 5, 2019 at 16:47
  • Thanks Erfan, I just edited, hopefully it is easier to understand. Commented Apr 5, 2019 at 16:52

1 Answer 1

1

First we use DataFrame.replace to replace the whitespaces with NaN. Then use DataFrame.groupby and apply fillna with methods backfill and forwardfill to fill in your empty spaces. Finally we can use drop_duplicates to get the single row as wanted.

df['Key'] = df['FirstName'] + ' ' + df['LastName']
df.replace('', np.NaN, inplace=True)
df = df.groupby('Key').apply(lambda x: x.fillna(method='ffill').fillna(method='bfill')).drop_duplicates()

print(df)
  LastName FirstName Department    Title Extension           Email       Key
0      Doe      Jane         HR  Officer      0000  [email protected]  Jane Doe
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Erfan, that worked. I am relatively new to data science in Python and am still trying to learn the necessary packages.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.