I have 2 pandas dataframes in python which are set up as folows:
Dataframe 1:
ID Paragraph
1 'A B C D E'
2 'A F G H L'
3 'B J P Q W'
4 'G F D S A'
Where Paragraph is a string of multiple words.
Dataframe 2:
ID Name Words
1 First ['A', 'F']
2 Second ['B', 'Z']
3 Thrird ['P', 'Q']
4 Fourth ['H', 'J']
Name is just a string identifying the Words. And Words is a list of strings.
So what I want to do is have an expression that will identify which Paragraphs in Dataframe 1 contain Words from Dataframe 2. And I want to store the Name of the Words in a new column in Dataframe 1. The new column will contain a list of all the Names where a Word from Words occurred in the paragraph. The order does not matter and there must be no duplicates in the list.
For example:
New Dataframe 1:
ID Paragraph Names
1 'A B C D E' [First, Second]
2 'A F G H L' [First, Fourth]
3 'B J P Q W' [Second, Third, Fourth]
4 'G F D S A' [First]
I can only make a solution that has deeply nested for loops and takes a very long time to execute. Is there a solution that has a shorter computation time my thinking is maybe using loc and/or lambda functions.
Any help would be greatly appreciated!
Let me know if there is anything I need to clarify.
English is not my first language so I can try explain more if I need to.
Thank you
Here is the code for the dummy dataframes:
data_1 = {'Paragraph': ['A B C D E', 'A F G H L', 'B J P Q W', 'G F D S A']}
df_1 = pd.DataFrame(data_1)
data_2 = {'Name': ['First', 'Second', 'Third', 'Fourth'],
'Words': [['A', 'F'], ['B', 'Z'], ['P', 'Q'], ['H', 'J']]}
df_2 = pd.DataFrame(data_2)