Python: Value returned by function not getting updated in pandas dataframe

Question

I have a fruits dataframe with columns: (Name, Color) and a sentence dataframe with columns: (Sentence).

fruits dataframe

          Name   Color
0        Apple     Red
1        Mango  Yellow
2       Grapes   Green
3   Strawberry    Pink

sentence dataframe

                      Sentence
0  I like Apple, Mango, Grapes
1            I like ripe Mango
2             Grapes are juicy
3           Oranges are citric

I need to compare each row of the fruits dataframe with every row of the sentence dataframe and if the fruit name appears exactly as such in the sentence, concatenate its color before the fruit name in the sentence.

This is what I have done using dataframe.apply():

import pandas as pd
import regex as re

# create fruit dataframe 
fruit_data = [['Apple', 'Red'], ['Mango', 'Yellow'], ['Grapes', 'Green']] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color']) 
print(fruit_df)

# create sentence dataframe 
sentence = ['I like Apple, Mango, Grapes', 'I like ripe Mango', 'Grapes are juicy'] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence']) 
print(sentence_df)


def search(desc, name, color):

    if re.findall(r"\b" + name + r"\b", desc):
             
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
            return new_desc 

def compare(name, color):
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    

fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

The result I am getting is:

                      Sentence     Result
0  I like Apple, Mango, Grapes       None
1            I like ripe Mango       None
2             Grapes are juicy       None
3           Oranges are citric       None

The expected result:

                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric

I also tried iterating through the fruits_df using itertuples() but still the result is the same

for row in fruit_df.itertuples():
   result = sentence_df['Sentence'].apply(lambda x: search(x, getattr(row, 'Name'), getattr(row, 'Color')))
   print(result)

I can't understand why the value returned by search function is not stored in the new column. Is this the right way to do it or am I missing something?

Shubham Sharma · Accepted Answer · 2021-03-08 16:49:14Z

4

We can create a mapping series with the help of fruits dataframe, then use this mapping series with Series.replace to substitute the occurrences of fruit name in Sentence column with the corresponding replacement (Color + Fruit name) from the mapping series:

fruit = r'\b' + fruits['Name'] + r'\b'
fruit_replacement = list(fruits['Color'] + ' ' + fruits['Name'])

mapping = pd.Series(fruit_replacement, index=fruit)
sentence['Result'] = sentence['Sentence'].replace(mapping, regex=True)

>>> sentence
                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric                            Oranges are citric

edited Mar 8, 2021 at 16:49

answered Mar 8, 2021 at 16:37

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Animeartist Over a year ago

Thank you for the solution! This approach is less time consuming than my current approach.

Shubham Sharma Over a year ago

@Animeartist Happy coding!

Serge Ballesta · Accepted Answer · 2021-03-08 16:33:00Z

4

The problem is that you call compare for each row of Fruit but use the same input on each pass.

I have just added some debugging prints to the compare function to understand what happens:

def compare(name, color):
    print(name, color)
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    print(sentence_df['Result'])

and got:

Apple Red
0    I like Red Apple, Mango, Grapes
1                               None
2                               None
Name: Result, dtype: object
Mango Yellow
0    I like Apple, Yellow Mango, Grapes
1              I like ripe Yellow Mango
2                                  None
Name: Result, dtype: object
Grapes Green
0    I like Apple, Mango, Green Grapes
1                                 None
2               Green Grapes are juicy
Name: Result, dtype: object

So you successfully add the color when the fruit is present, but return None when it is not, and start from the original column on each pass, hence only keeping last one.

How to fix:

First add a missing return desc in search, to avoid the None results

 def search(desc, name, color):

     if re.findall(r"\b" + name + r"\b", desc):
             ...                 
             new_desc = ''.join(arr)
             return new_desc
     return desc

Initialize df['Result'] before applying compare, and use it as its input:

 def compare(name, color):
     sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color))

 sentence_df['Result'] = sentence_df['Sentence']
 fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)

To finaly get as expected:

The final result is: 
0    I like Red Apple, Yellow Mango, Green Grapes
1                        I like ripe Yellow Mango
2                          Green Grapes are juicy
Name: Result, dtype: object

answered Mar 8, 2021 at 16:33

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

2 Comments

Shubham Sharma Over a year ago

Nice explaination !

Animeartist Over a year ago

Thank you for the solution! Initializing the result column did the trick.

Pygirl · Accepted Answer · 2021-03-08 16:49:10Z

1

Create a map dict and then replace.

try:

di = {fr: f"{co} {fr}" for fr, co in fruit_df.values}
res = sentence_df.replace(di, regex=True)

res:

    Sentence
0   I like Red Apple, Yellow Mango, Green Grapes
1   I like ripe Yellow Mango
2   Green Grapes are juicy

answered Mar 8, 2021 at 16:49

Pygirl

13.4k6 gold badges36 silver badges48 bronze badges

Collectives™ on Stack Overflow

Python: Value returned by function not getting updated in pandas dataframe

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related