0

I've seen a large number of similar questions but nothing quite answers what I am looking to do.

I have two dataframes

  • Conn_df that contains names and company details manually entered (e.g. Conn_df["Name", "Company_name", "Company_Address"]
  • Cleanse_df that contains cleaned up company names (e.g. Cleanse_df["Original_Company_Name", "Cleanse_Company_Name"]

The data for both is held in csv files that are imported into the script.

I want to change the company details in Conn_df.Company_Name using the values in Cleanse_df, where the Conn_df.Company_Name equals the Cleanse_df.Original_Company_Name and is replaced by Cleanse_df.Cleanse_Company_Name.

I have tried:

Conn_df["Company"] = Conn_df["Company"].replace(Conn_df["Company"], Cleanse_df["Cleansed"]) but got

replace() takes no keyword arguments

I also tried:

Conn_df["Company"] = Conn_df["Company"].map(Cleanse_df.set_index("Original")["Cleansed"]) but got

Reindexing only valid with uniquely valued Index objects

Any suggestions on how to get the values to be replaced. I would note that both dataframes run to many tens of thousands of rows, so creating a manual list is not possible.

3
  • 2
    Does 'Cleanse_df' contain the same company name two or more times? Commented Jun 23, 2020 at 18:42
  • The cleanse_df would contain the same company name multiple times in the Cleanse_Company_Name column, but only once in the Original_Company_Name column Commented Jun 23, 2020 at 19:05
  • 2
    Okay, can you create a small dataset that duplicates you error above? Commented Jun 23, 2020 at 19:07

1 Answer 1

1

I think you want something along the lines of this:

conn_df = pd.DataFrame({'Name':['Mac','K','Hutt'],
                    'Company_name':['McD','KFC','PH'],
                    'Company_adress':['street1','street2','street4']})
cleanse_df = pd.DataFrame({'Original_Company_Name':['McD'],'Cleanse_Company_Name': 
['MacDonalds']})
cleanse_df = cleanse_df.rename(columns={'Original_Company_Name':'Company_name'})
merged_df = conn_df.merge(cleanse_df,on='Company_name',how='left')
merged_df['Cleanse_Company_Name'].fillna(merged_df['Company_name'],inplace=True)
final_df =  merged_df[['Name','Company_adress','Cleanse_Company_Name']]\
    .rename(columns={'Cleanse_Company_Name':'Company_name'})

This would return:

   Name Company_adress Company_name
0   Mac        street1   MacDonalds
1     K        street2          KFC
2  Hutt        street4           PH

You merge the two dataframes and then keep the replaced new value, if there is no value to replace the name then the name will just stay the same, this is done by the fillna command.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, this is exactly what I was looking for. Tested it out, and ran in 45 seconds.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.