I am trying to replace multiple rows of pandas dataframe, with values from another dataframe.
Supposed I have 10,000 rows of customer_id in my dataframe df1 and I want to replace these customer_id with 3,000 values from df2.
For the sake of illustration, let's generate the dataframes (below).
Say these 10 rows in df1 represent 10,000 rows, and the 3 rows from df2 represent 3,000 values.
import numpy as np
import pandas as pd
np.random.seed(42)
# Create df1 with unique values
arr1 = np.arange(100,200,10)
np.random.shuffle(arr1)
df1 = pd.DataFrame(data=arr1,
columns=['customer_id'])
# Create df2 for new unique_values
df2 = pd.DataFrame(data = [1800, 1100, 1500],
index = [180, 110, 150], # this is customer_id column on df1
columns = ['customer_id_new'])
I want to replace 180 with 1800, 110 with 1100, and 150 with 1500.
I know we can do below ...
# Replace multiple values
replace_values = {180 : 1800, 110 : 1100, 150 : 1500 }
df1_replaced = df1.replace({'customer_id': replace_values})
And it works fine if I only have a few lines...
But if I have thousands of lines that I need to replace, how could I do this without typing out what values I want to change one at a time?
EDIT: To clarify, I don't need to use replace. Anything that could replace those values in df1 from values in df2 in the fastest most efficient way is ok.