0

I have two data frames in Python like the following

df1 
CUSTOMER_KEY    LAST_NAME  FIRST_NAME   
30          f2b6769129  97bb97bebc  
46          ca0464878d  e276539bc2  
51          62f2905a7a  8dfabd6d61  
57          21032ca3bc  1f7e5e0c6e  
62          f7e7fdd8ce  eb6cf4af99  
64          f536998bbb  7fc39eacd1  
80          6069198f63  d873a71620  
99          0ba61a6f66  a6cf7af3eb
102         e8b579b776  c8048fd459

df2
CUSTOMER_KEY    LAST_NAME   FIRST_NAME
30          Arthur      Anderson      
46          Teresa      Johns     
51          Louise      Hurwitz     
57          Timothy         Addy     
62          Jeffery     Wilson      
64          Andres      Tuller      
80          Daniel      Green      
99          Frank       Nader      
102         Faith       Young

I want to join these two data frames on Customer_key (which i can do in Merge) and later concatenate on few columns from the data frame to form a new string in the result data frame. From the below dataframes the result that i am looking is as follows

result_df
CUSTOMER_KEY LAST_NAME  FIRST_NAME
30           Artf2b676  And97bb97
46           Terca0464  Johe27653

Basically, substring(last_name,1,4) in df2 and substring(last_name,1,6) in df1 and concatenate those into the new column. Similarly other columns.

How can i achieve this please.

Thanks and Regards

Bala

2 Answers 2

3

Using str

df2['LAST_NAME']=df2['LAST_NAME'].str[:3]+df1['LAST_NAME'].str[:6]
df2['FIRST_NAME']=df2['FIRST_NAME'].str[:3]+df1['FIRST_NAME'].str[:6]

df2
Out[768]: 
   CUSTOMER_KEY  LAST_NAME FIRST_NAME
0            30  Artf2b676  And97bb97
1            46  Terca0464  Johe27653
2            51  Lou62f290  Hur8dfabd
3            57  Tim21032c  Add1f7e5e
4            62  Jeff7e7fd  Wileb6cf4
5            64  Andf53699  Tul7fc39e
6            80  Dan606919  Gred873a7
7            99  Fra0ba61a  Nada6cf7a
8           102  Faie8b579  Youc8048f

If you need merge .

result=df1.merge(df2,on=['CUSTOMER_KEY'])
Sign up to request clarification or add additional context in comments.

2 Comments

@White Wen Thanks for your responses. That worked for me. I need to write some more logic to make it completely dynamic, without any 'hard-coded' reference to column names. Working on that now. Thanks a lot for your help Regards Bala
@BalajiKrishnan Yw~ glad it help , have a nice day
1

Using merge + str

import pandas as pd
df = pd.DataFrame([
    ['30','f2b6769129','97bb97bebc'],
    ['46','ca0464878d','e276539bc2'],
    ['51','62f2905a7a','8dfabd6d61'],
    ['57','21032ca3bc','1f7e5e0c6e'],
    ['62','f7e7fdd8ce','eb6cf4af99'],
    ['64','f536998bbb','7fc39eacd1'],
    ['80','6069198f63','d873a71620'],
    ['99','0ba61a6f66','a6cf7af3eb'],
    ['102','e8b579b776','c8048fd459']]
)

df2 = pd.DataFrame([
    ['30','Arthur','Anderson'],
    ['46','Teresa','Johns'],
    ['51','Louise','Hurwitz'],
    ['57','Timothy','Addy'],
    ['62','Jeffery','Wilson'],
    ['64','Andres','Tuller'],
    ['80','Daniel','Green'],
    ['99','Frank','Nader'],
    ['102','Faith','Young']]
)

keys = ['CUSTOMER_KEY','LAST_NAME','FIRST_NAME']
df.columns = keys
df2.columns = keys
df_join = pd.merge(df, df2, on="CUSTOMER_KEY", suffixes=['_1', '_2'])
df_join['LAST_NAME'] = df_join['LAST_NAME_2'].str.slice(0,3)+df_join['LAST_NAME_1'].str.slice(0,5)
df_join['FIRST_NAME'] = df_join['FIRST_NAME_2'].str.slice(0,3)+df_join['FIRST_NAME_1'].str.slice(0,5)
result_df = df_join[keys]


result_df.head()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.