1

I have two data frame:

Df1:

Original df has 1000+ Name

   Id    Name
    1     Paper
    2     Paper Bag
    3     Scissors
    4     Mat
    5     Cat
    6     Good Cat

2nd Df:

Original df has 1000+ Item_Name

Item_ID   Item_Name
1         Paper Bag
2         wallpaper
3         paper
4         cat cage
5         good cat

Expected Output:

Id Name         Item_ID
1  Paper         1,2,3
2  Paper Bag     1,2,3
3  Scissors      NA 
4  Mat           NA 
5  Cat           4,5
6  Good Cat           4,5

My Code:

def matcher(x):
    res = df2.loc[df2['Item_Name'].str.contains(x, regex=False, case=False), 'Item_ID']
    return ','.join(res.astype(str))

df1['Item_ID'] = df1['Name'].apply(matcher)

Current Challenges

str.contains work when name has Paper and Item_Name has Paper Bag but it doesn't work other way around. So, it my example it work for row 1,3,4,5 for df1 but not for row 2 & 6. So, it will not map row 2 of df1 with row 3 of df2

Ask

So, if you can help me in modifying the code so that it can help in matching otherway round also

4
  • 2
    Why would it work the other way around? "Paper Bag" is not in "Paper" Commented Nov 28, 2018 at 16:29
  • So all I want is partial matching...any of the two or three words match with the given word Commented Nov 28, 2018 at 16:31
  • It sounds like your problem is breaking down the compound words and running comparisons against each then Commented Nov 28, 2018 at 16:32
  • Yes!!Only for compound words Commented Nov 28, 2018 at 16:34

1 Answer 1

3

You can modify your custom matcher function and use apply():

def matcher(query):

    matches = [i['Item_ID'] for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in query.lower().split())]
    if matches:
        return ','.join(map(str, matches))
    else:
        return 'NA'

df1['Item_ID'] = df1['Name'].apply(matcher)

Returns:

   Id       Name Item_ID
0   1      Paper   1,2,3
1   2  Paper Bag   1,2,3
2   3   Scissors      NA
3   4        Mat      NA
4   5        Cat     4,5
5   6   Good Cat     4,5

Explanation:

We are using apply() to apply our custom matcher() function to each row value of your df1['Name'] column. In our matcher() function, we are converting df2 into a dictionary with the Item_ID as the keys and the Name as the values. We then can check if our current row value query is present in any() of the Name values from df1 (converted to lowercase via lower()), and if so, then we can add the Item_ID to a list to be returned.

Sign up to request clarification or add additional context in comments.

4 Comments

I am runnin your code on my df...give me some time. Also, if you give comments on what part of code is doing what..it will be really helpful
I've added some commentary to my answer.
Just 1 more query..if I need to map two columns instead of one..i.e. Item_Id and one more column Material_Id. Is their any way we can tweak the code and do it...currently I am running the same function two times by changing column name
Go ahead and post that as a separate question and either I or someone else can help you work through it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.