I have a pandas dataframe like this:
data={
'col1':['New Zealand', 'Gym', 'United States'],
'col2':['Republic of South Africa', 'Park', 'United States of America'],
}
df=pd.DataFrame(data)
print(df)
col1 col2
0 New Zealand Republic of South Africa
1 Gym Park
2 United States United States of America
And I have a sentence that might contain words from any of the columns of the dataframe. I want to get the values in columns that are present in the sentence given and in which column they are. I have seen some similar solutions but they match the sentence given with the column values and not the other way around. Currently, I am doing it like this:
def find_match(df,sentence):
"returns true/false depending on the matching value and column name where the value exists"
arr=[]
cols=[]
flag=False
for i,row in df.iterrows():
if row['col1'].lower() in sentence.lower():
arr.append(row['col1'])
cols.append('col1')
flag=True
elif row['col2'].lower() in sentence.lower():
arr.append(row['col2'])
cols.append('col2')
flag=True
return flag,arr,cols
sentence="I live in the United States"
find_match(df,sentence) # returns (True, ['United States'], ['col1'])
I want a more pythonic way to do this because it is consuming lots of time on quite a large dataframe and it doesn't seem pythonic to me.
I cannot use .isin() because it wants a list of strings and matches the column value with the whole sentence given. I have tried doing the following as well but it throws error:
df.loc[df['col1'].str.lower() in sentence] # throws error that df['col1'] should be a string
Any help will be highly appreciated. Thanks!