2

I have a dataframe: df1

  SAP_Name  SAP_Class  SAP_Sec
  Avi       5          C 
  Rison     6          A 
  Slesh     7          B 
  San       8          C 
  Sud       7          B 

df2:

Name_Fi Class

Avi     5 
Rison   6 
Slesh   7 

I am trying to match df2 to df1 such that the matching values should have the headers replaced same as df1.

SAP_Name  SAP_Class
 Ankan          5
 Rison          6
 Slesh          7

Below is the code which I am using :

d = {}
for col2 in df2.columns:
    for col1 in df1.columns:
        cond = df2[col2].isin(df1[col1]).all()
        if cond:
           d[col2] = col1
df2 = df2.rename(columns=d)
print (df2)

I am able to get the desired output in a small file, however My actual file has 112444 rows × 446 columns and the target file to be changed has 3 rows × 35 columns , the code is running for a long long time in this case. Can anyone please help me here?

2 Answers 2

2

In my opinion if performance is important use issubset with set:

d = {}
for col2 in df2.columns:
    for col1 in df1.columns:
        cond = set(df2[col2]).issubset(df1[col1])
        if cond:
           d[col2] = col1
df2 = df2.rename(columns=d)
print (df2)
  SAP_Name  SAP_Class
0      Avi          5
1    Rison          6
2    Slesh          7

EDIT:

#create dictioanry of Series without dupes
dfs1 = {col1: df1[col1].drop_duplicates() for col1 in df1.columns}
#print (dfs1)

#create dictionary of sets
set2 = {col2: set(df2[col2]) for col2 in df2.columns}
#print (set2)

#loop buy both dictionaries and find columns for rename
d = {}
for col2, v2 in set2.items():
    for col1, v1 in dfs1.items():
        cond = v2.issubset(v1)
        if cond:
           d[col2] = col1
df2 = df2.rename(columns=d)
print (df2)
  SAP_Name  SAP_Class
0      Avi          5
1    Rison          6
2    Slesh          7
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks a lot @jezrael, I will check and let you know the status. :)
Hi @jezrael , so the code is still running. Its been an hour, is this expected considering the lookup data has 112444 rows × 446 columns and the target file has 3 rows × 35 columns ??
its not working either, the code keeps running. is there a way to optimise this?
@anky_91 - I get one idea, please check edited answer.
@anky_91 - Idea is simply - nested loop is used only for comparing, remove duplicates and converting to sets is in another loops for processes code for each column only once.
|
2

I'd rename the columns and use merge.

cols = ['SAP_Name', 'SAP_Class']
df2.set_axis(cols, axis=1, inplace=False).merge(df1[cols])

  SAP_Name  SAP_Class
0      Avi          5
1    Rison          6
2    Slesh          7

1 Comment

Thank you @piRSquared , but I have too many columns to be renamed. Also the base file is huge so looking over it manually isn't the idea here. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.