Mapping 2 dataframes and replacing header of matched values in target dataframe

Question

I have a dataframe: df1

  SAP_Name  SAP_Class  SAP_Sec
  Avi       5          C 
  Rison     6          A 
  Slesh     7          B 
  San       8          C 
  Sud       7          B

df2:

Name_Fi Class

Avi     5 
Rison   6 
Slesh   7

I am trying to match df2 to df1 such that the matching values should have the headers replaced same as df1.

SAP_Name  SAP_Class
 Ankan          5
 Rison          6
 Slesh          7

Below is the code which I am using :

d = {}
for col2 in df2.columns:
    for col1 in df1.columns:
        cond = df2[col2].isin(df1[col1]).all()
        if cond:
           d[col2] = col1
df2 = df2.rename(columns=d)
print (df2)

I am able to get the desired output in a small file, however My actual file has 112444 rows × 446 columns and the target file to be changed has 3 rows × 35 columns , the code is running for a long long time in this case. Can anyone please help me here?

jezrael · Accepted Answer · 2018-07-27 10:51:12Z

2

In my opinion if performance is important use issubset with set:

d = {}
for col2 in df2.columns:
    for col1 in df1.columns:
        cond = set(df2[col2]).issubset(df1[col1])
        if cond:
           d[col2] = col1
df2 = df2.rename(columns=d)
print (df2)
  SAP_Name  SAP_Class
0      Avi          5
1    Rison          6
2    Slesh          7

EDIT:

#create dictioanry of Series without dupes
dfs1 = {col1: df1[col1].drop_duplicates() for col1 in df1.columns}
#print (dfs1)

#create dictionary of sets
set2 = {col2: set(df2[col2]) for col2 in df2.columns}
#print (set2)

#loop buy both dictionaries and find columns for rename
d = {}
for col2, v2 in set2.items():
    for col1, v1 in dfs1.items():
        cond = v2.issubset(v1)
        if cond:
           d[col2] = col1
df2 = df2.rename(columns=d)
print (df2)
  SAP_Name  SAP_Class
0      Avi          5
1    Rison          6
2    Slesh          7

edited Jul 27, 2018 at 10:51

answered Jul 27, 2018 at 5:22

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

anky Over a year ago

Thanks a lot @jezrael, I will check and let you know the status. :)

anky Over a year ago

Hi @jezrael , so the code is still running. Its been an hour, is this expected considering the lookup data has 112444 rows × 446 columns and the target file has 3 rows × 35 columns ??

anky Over a year ago

its not working either, the code keeps running. is there a way to optimise this?

jezrael Over a year ago

@anky_91 - I get one idea, please check edited answer.

jezrael Over a year ago

@anky_91 - Idea is simply - nested loop is used only for comparing, remove duplicates and converting to sets is in another loops for processes code for each column only once.

|

piRSquared · Accepted Answer · 2018-07-27 05:31:26Z

2

I'd rename the columns and use merge.

cols = ['SAP_Name', 'SAP_Class']
df2.set_axis(cols, axis=1, inplace=False).merge(df1[cols])

  SAP_Name  SAP_Class
0      Avi          5
1    Rison          6
2    Slesh          7

answered Jul 27, 2018 at 5:31

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

anky Over a year ago

Thank you @piRSquared , but I have too many columns to be renamed. Also the base file is huge so looking over it manually isn't the idea here. :)

Collectives™ on Stack Overflow

Mapping 2 dataframes and replacing header of matched values in target dataframe

2 Answers 2

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related