1

I have two dataframes which I would like to append based on a regex. If the value in a 'code' column of df1 matches (eg. R93) with the 'ICD_CODE' of df2(eg. R93), append the 'code' column value to df2.

df1
code
R93.2
S03


df2
ICD_CODE    ICD_term                        MDR_code    MDR_term    
R93.1       Acute abdomen                   10000647    Acute abdomen   
K62.4       Stenosis of anus and rectum     10002581    Anorectal stenosis
S03.1       Hand-Schüller-Christian disease 10053135    Hand-Schueller-Christian disease

The expected output is:

code    ICD_CODE    ICD_term                        MDR_code    MDR_term    
R93.2   R93.1       Acute abdomen                   10000647    Acute abdomen   
S03     S03.1       Hand-Schüller-Christian disease 10053135    Hand-Schueller-Christian disease

Any help is highly appreciated!

1
  • Strictly speaking, I don't think Pandas supports doing this with a proper regex. But the answer below gets close to hacking it. Commented Jan 27, 2023 at 22:45

2 Answers 2

1

Keep the left part (before dot) of each code columns as the merge key:

out = (df1.merge(df2, left_on=df1['code'].str.split('.').str[0], 
                right_on=df2['ICD_CODE'].str.split('.').str[0])
          .drop(columns='key_0'))
print(out)

# Output
    code ICD_CODE                         ICD_term  MDR_code                          MDR_term
0  R93.2    R93.1                    Acute abdomen  10000647                     Acute abdomen
1    S03    S03.1  Hand-Schüller-Christian disease  10053135  Hand-Schueller-Christian disease
Sign up to request clarification or add additional context in comments.

Comments

0

A possible solution would be to use process.extractOne from .

#pip install fuzzywuzzy
from fuzzywuzzy import process
​
out = (df1.assign(matched_code=df1["code"].apply(lambda x: process.extractOne(x, df2["ICD_CODE"])[0]))
          .merge(df2, left_on="matched_code", right_on="ICD_CODE")
          .drop(columns="matched_code")
       )

​ Output :

print(out)
​
    code ICD_CODE                         ICD_term  MDR_code                          MDR_term
0  R93.2    R93.1                    Acute abdomen  10000647                     Acute abdomen
1    S03    S03.1  Hand-Schüller-Christian disease  10053135  Hand-Schueller-Christian disease

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.