I have a fruits dataframe with columns: (Name, Color) and a sentence dataframe with columns: (Sentence).
fruits dataframe
Name Color
0 Apple Red
1 Mango Yellow
2 Grapes Green
3 Strawberry Pink
sentence dataframe
Sentence
0 I like Apple, Mango, Grapes
1 I like ripe Mango
2 Grapes are juicy
3 Oranges are citric
I need to compare each row of the fruits dataframe with every row of the sentence dataframe and if the fruit name appears exactly as such in the sentence, concatenate its color before the fruit name in the sentence.
This is what I have done using dataframe.apply():
import pandas as pd
import regex as re
# create fruit dataframe
fruit_data = [['Apple', 'Red'], ['Mango', 'Yellow'], ['Grapes', 'Green']]
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color'])
print(fruit_df)
# create sentence dataframe
sentence = ['I like Apple, Mango, Grapes', 'I like ripe Mango', 'Grapes are juicy']
sentence_df = pd.DataFrame(sentence, columns = ['Sentence'])
print(sentence_df)
def search(desc, name, color):
if re.findall(r"\b" + name + r"\b", desc):
# for loop is used because fruit can appear more than once in sentence
all_indexes = []
for match in re.finditer(r"\b" + name + r"\b", desc):
all_indexes.append(match.start())
arr = list(desc)
for idx in sorted(all_indexes, reverse=True):
arr.insert(idx, color + " ")
new_desc = ''.join(arr)
return new_desc
def compare(name, color):
sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])
The result I am getting is:
Sentence Result
0 I like Apple, Mango, Grapes None
1 I like ripe Mango None
2 Grapes are juicy None
3 Oranges are citric None
The expected result:
Sentence Result
0 I like Apple, Mango, Grapes I like Red Apple, Yellow Mango, Green Grapes
1 I like ripe Mango I like ripe Yellow Mango
2 Grapes are juicy Green Grapes are juicy
3 Oranges are citric
I also tried iterating through the fruits_df using itertuples() but still the result is the same
for row in fruit_df.itertuples():
result = sentence_df['Sentence'].apply(lambda x: search(x, getattr(row, 'Name'), getattr(row, 'Color')))
print(result)
I can't understand why the value returned by search function is not stored in the new column. Is this the right way to do it or am I missing something?