Match Strings of 2 dataframe columns in Python

Question

I have two data frame:

Df1:

Original df has 1000+ Name

   Id    Name
    1     Paper
    2     Paper Bag
    3     Scissors
    4     Mat
    5     Cat
    6     Good Cat

2nd Df:

Original df has 1000+ Item_Name

Item_ID   Item_Name
1         Paper Bag
2         wallpaper
3         paper
4         cat cage
5         good cat

Expected Output:

Id Name         Item_ID
1  Paper         1,2,3
2  Paper Bag     1,2,3
3  Scissors      NA 
4  Mat           NA 
5  Cat           4,5
6  Good Cat           4,5

My Code:

def matcher(x):
    res = df2.loc[df2['Item_Name'].str.contains(x, regex=False, case=False), 'Item_ID']
    return ','.join(res.astype(str))

df1['Item_ID'] = df1['Name'].apply(matcher)

Current Challenges

str.contains work when name has Paper and Item_Name has Paper Bag but it doesn't work other way around. So, it my example it work for row 1,3,4,5 for df1 but not for row 2 & 6. So, it will not map row 2 of df1 with row 3 of df2

Ask

So, if you can help me in modifying the code so that it can help in matching otherway round also

Why would it work the other way around? "Paper Bag" is not in "Paper" — Maximilian Burszley
– Maximilian Burszley, Commented Nov 28, 2018 at 16:29
So all I want is partial matching...any of the two or three words match with the given word — Rahul Agarwal
– Rahul Agarwal, Commented Nov 28, 2018 at 16:31
It sounds like your problem is breaking down the compound words and running comparisons against each then — Maximilian Burszley
– Maximilian Burszley, Commented Nov 28, 2018 at 16:32

rahlf23 · Accepted Answer · 2018-11-28 17:00:00Z

3

You can modify your custom matcher function and use apply():

def matcher(query):

    matches = [i['Item_ID'] for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in query.lower().split())]
    if matches:
        return ','.join(map(str, matches))
    else:
        return 'NA'

df1['Item_ID'] = df1['Name'].apply(matcher)

Returns:

   Id       Name Item_ID
0   1      Paper   1,2,3
1   2  Paper Bag   1,2,3
2   3   Scissors      NA
3   4        Mat      NA
4   5        Cat     4,5
5   6   Good Cat     4,5

Explanation:

We are using apply() to apply our custom matcher() function to each row value of your df1['Name'] column. In our matcher() function, we are converting df2 into a dictionary with the Item_ID as the keys and the Name as the values. We then can check if our current row value query is present in any() of the Name values from df1 (converted to lowercase via lower()), and if so, then we can add the Item_ID to a list to be returned.

edited Nov 28, 2018 at 17:00

answered Nov 28, 2018 at 16:44

rahlf23

9,0494 gold badges30 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rahul Agarwal Over a year ago

I am runnin your code on my df...give me some time. Also, if you give comments on what part of code is doing what..it will be really helpful

rahlf23 Over a year ago

I've added some commentary to my answer.

Rahul Agarwal Over a year ago

Just 1 more query..if I need to map two columns instead of one..i.e. Item_Id and one more column Material_Id. Is their any way we can tweak the code and do it...currently I am running the same function two times by changing column name

rahlf23 Over a year ago

Go ahead and post that as a separate question and either I or someone else can help you work through it.

Collectives™ on Stack Overflow

Match Strings of 2 dataframe columns in Python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related