Python: Merging two columns of two different pandas dataframe using string matching

Question

I am trying to perform string matching between two pandas dataframe.

df_1:
ID   Text           Text_ID
1    Apple            53
2    Banana           84
3    Torent File      77

df_2: 
ID   File_name      
22   apple_mask.txt
23   melon_banana.txt
24   Torrent.pdf
25   Abc.ppt

Objective: I want to populate the Text_ID against File_name in df_2 if the string in df_1['Text'] matches with df_2['File_name']. If no matches found then populate the df_2[Text_ID] as -1. So the resultant df` looks like

ID   Flie_name           Text_ID
22   apple_mask.txt        53
23   melon_banana.txt      84
24   Torrent.pdf           77          
25   Abc.ppt               -1

I have tried this SO thread, but it is giving a column where File_name wise fuzz score is listed.

I am trying out a non fuzzy way. Please see below the code snippets:

text_ls = df_1['Text'].tolist()
file_ls = df_2['File_name'].tolist()
text_id = []
for i,j in zip(text_ls,file_ls):
  if str(j) in str(i):
    t_i = df_1.loc[df_1['Text']==i,'Text_ID']
    text_id.append(t_i)
  else:
    t_i = -1
    text_id.append(t_i)
df_2['Text_ID'] = text_id

But I am getting a blank text_id list.

Can anybody provide some clue on this? I am OK to use fuzzywuzzy as well.

fullfine · Accepted Answer · 2020-12-07 15:10:13Z

1

You can get it with the following code:

df2['Text_ID'] = -1    # set -1 by default for all the file names
for _,file_name in df2.iterrows():
    for _, text in df1.iterrows():     
        if text[0].lower() in file_name[0]:  # compare strings
            df2.loc[df2.File_name == file_name[0],'Text_ID'] = text[1] # assaign the Text_ID from df1 in df2
            break

Keep in mind:

String comparison: As it is now working, apple and banana are contained in apple_mask.txt and melon_banana.txt, but torrent file is not in torrent.pdf. Consider redefining the string comparison.
df.iterrows() returns two values, the index of the row and the values of the row, in this case I have replaced the index by _ since it is not necessary to solve this problem

result:

df2
          File_name  Text_ID
0   apple_mask.text       53
1  melon_banana.txt       84
2       Torrent.pdf       -1
3           Abc.ppt       -1

answered Dec 7, 2020 at 15:10

fullfine

1,4711 gold badge7 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

fullfine Over a year ago

if my answer was useful, can you mark it as an accepted answer?

SAL · Accepted Answer · 2020-12-07 12:41:58Z

0

You can try following code:

text_ls = df_1['Text'].tolist()
file_ls = df_2['File_name'].tolist()
text_id = []
for i,j in zip(text_ls,file_ls):
      if j.lower().find(i.lower()) == -1:
        t_i = -1
        df_2.loc[df_2['File_name']==j,'Text_ID']=t_i
      else:
        t_i = df_1.loc[df_1['Text']==i,'Text_ID']
        df_2.loc[df_2['File_name']==j,'Text_ID']=t_i

edited Dec 7, 2020 at 12:41

answered Dec 7, 2020 at 12:26

SAL

6324 silver badges17 bronze badges

4 Comments

pythondumb Over a year ago

Not quite sure about this line if j.find(i) == -1:. Please note there is no entry in df_2 as -1. Also you seemed to have missed the fact that df_1['Text'] and df_2['File_name'] are not exactly same.

SAL Over a year ago

Refer following answer for further clarification. stackoverflow.com/a/27138045/3417134

SAL Over a year ago

I have updated the code to convert df_1['Text'] and df_2['File_name'] to lower just for sake to comparison.

pythondumb Over a year ago

Your method might fail, if there are (1) blanks in df_2 (2) repeat of same file name in df_2. I just checked with my actual example and it is performing correct for the 1st match.

Collectives™ on Stack Overflow

Python: Merging two columns of two different pandas dataframe using string matching

2 Answers 2

1 Comment

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related