I have 2 DFs DFA & DFB
What I want to do is find a match between a string or a substring on the SKU column and merge the Company column to DFB
Code
// Remove white spaces, special characters and convert d type to string
dfa['Clean SKU'] = dfa['SKU'].replace(r'[^0-9a-zA-Z ]', '', regex=True).replace("'", '')
dfb['Clean SKU'] = dfb['SKU'].replace(r'[^0-9a-zA-Z ]', '', regex=True).replace("'", '')
dfa['Clean SKU'] = dfa['Clean SKU'].replace(r'\s+', '', regex=True)
dfb['Clean SKU'] = dfb['Clean SKU'].replace(r'\s+', '', regex=True)
# Change D.Types
dfa['Clean SKU'] = dfa['Clean SKU'].astype(str)
dfb['Clean SKU'] = dfb['Clean SKU'].astype(str)
// Create new column to merge on and convert to lowercase
dfa['SKU_to_merge'] = dfa['Clean SKU'].str.lower()
// Extract a unique list from the Clean SKU column
pat = r'(%s)'%'|'.join(dfa['Clean SKU'].str.lower().unique())
// Create a column with common matches
dfb['SKU_to_merge'] = dfb['Clean SKU'].str.lower().str.extract(pat)
// Merge the DFs on the SKU to merge
dfb = dfb.merge(dfa[['SKU_to_merge','Company']], on='SKU_to_merge', how='left')
ISSUE For SKU 601251x the SKU_to_merge should be 601251x as this SKU is in DFA (should only match by substring where direct matching is not possible). So in this instance the corresponding Company should be Google not Amazon


601251comes before601251xinpat, so that won't work. You could try to usedfb['Clean SKU'].sort_values(key=lambda c: -c.str.len())instead ofdfb['Clean SKU']when buildingpat(sorting by length, decending).