4

I have a fruits dataframe with columns: (Name, Color) and a sentence dataframe with columns: (Sentence).

fruits dataframe

          Name   Color
0        Apple     Red
1        Mango  Yellow
2       Grapes   Green
3   Strawberry    Pink

sentence dataframe

                      Sentence
0  I like Apple, Mango, Grapes
1            I like ripe Mango
2             Grapes are juicy
3           Oranges are citric

I need to compare each row of the fruits dataframe with every row of the sentence dataframe and if the fruit name appears exactly as such in the sentence, concatenate its color before the fruit name in the sentence.

This is what I have done using dataframe.apply():

import pandas as pd
import regex as re

# create fruit dataframe 
fruit_data = [['Apple', 'Red'], ['Mango', 'Yellow'], ['Grapes', 'Green']] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color']) 
print(fruit_df)

# create sentence dataframe 
sentence = ['I like Apple, Mango, Grapes', 'I like ripe Mango', 'Grapes are juicy'] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence']) 
print(sentence_df)


def search(desc, name, color):

    if re.findall(r"\b" + name + r"\b", desc):
             
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
            return new_desc 

def compare(name, color):
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    

fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

The result I am getting is:

                      Sentence     Result
0  I like Apple, Mango, Grapes       None
1            I like ripe Mango       None
2             Grapes are juicy       None
3           Oranges are citric       None

The expected result:

                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric       

I also tried iterating through the fruits_df using itertuples() but still the result is the same

for row in fruit_df.itertuples():
   result = sentence_df['Sentence'].apply(lambda x: search(x, getattr(row, 'Name'), getattr(row, 'Color')))
   print(result)

I can't understand why the value returned by search function is not stored in the new column. Is this the right way to do it or am I missing something?

3 Answers 3

4

We can create a mapping series with the help of fruits dataframe, then use this mapping series with Series.replace to substitute the occurrences of fruit name in Sentence column with the corresponding replacement (Color + Fruit name) from the mapping series:

fruit = r'\b' + fruits['Name'] + r'\b'
fruit_replacement = list(fruits['Color'] + ' ' + fruits['Name'])

mapping = pd.Series(fruit_replacement, index=fruit)
sentence['Result'] = sentence['Sentence'].replace(mapping, regex=True)

>>> sentence
                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric                            Oranges are citric
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the solution! This approach is less time consuming than my current approach.
@Animeartist Happy coding!
4

The problem is that you call compare for each row of Fruit but use the same input on each pass.

I have just added some debugging prints to the compare function to understand what happens:

def compare(name, color):
    print(name, color)
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    print(sentence_df['Result'])

and got:

Apple Red
0    I like Red Apple, Mango, Grapes
1                               None
2                               None
Name: Result, dtype: object
Mango Yellow
0    I like Apple, Yellow Mango, Grapes
1              I like ripe Yellow Mango
2                                  None
Name: Result, dtype: object
Grapes Green
0    I like Apple, Mango, Green Grapes
1                                 None
2               Green Grapes are juicy
Name: Result, dtype: object

So you successfully add the color when the fruit is present, but return None when it is not, and start from the original column on each pass, hence only keeping last one.

How to fix:

  1. First add a missing return desc in search, to avoid the None results

     def search(desc, name, color):
    
         if re.findall(r"\b" + name + r"\b", desc):
                 ...                 
                 new_desc = ''.join(arr)
                 return new_desc
         return desc
    
  2. Initialize df['Result'] before applying compare, and use it as its input:

     def compare(name, color):
         sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color))
    
     sentence_df['Result'] = sentence_df['Sentence']
     fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
    

To finaly get as expected:

The final result is: 
0    I like Red Apple, Yellow Mango, Green Grapes
1                        I like ripe Yellow Mango
2                          Green Grapes are juicy
Name: Result, dtype: object

2 Comments

Nice explaination !
Thank you for the solution! Initializing the result column did the trick.
1

Create a map dict and then replace.

try:

di = {fr: f"{co} {fr}" for fr, co in fruit_df.values}
res = sentence_df.replace(di, regex=True)

res:

    Sentence
0   I like Red Apple, Yellow Mango, Green Grapes
1   I like ripe Yellow Mango
2   Green Grapes are juicy

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.