Let's say I have a string stored in text. I want to compare this string with a list of strings stored in a dataframe and check if the text contains words like car, plane, etc. For each keyword found, I want to add 1 value belonging to the correlated topic.
| topic | keywords |
|------------|-------------------------------------------|
| Vehicles | [car, plane, motorcycle, bus] |
| Electronic | [television, radio, computer, smartphone] |
| Fruits | [apple, orange, grape] |
I have written the following code, but I don't really like it. And it doesn't work as intended.
def foo(text, df_lex):
keyword = []
score = []
for lex_list in df_lex['keyword']:
print(lex_list)
val = 0
for lex in lex_list:
if lex in text:
val =+ 1
keyword.append(key)
score.append(val)
score_list = pd.DataFrame({
'keyword':keyword,
'score':score
})
Is there a way to do this efficiently? I don't like having too many loopings in my program, as they don't seem to be very efficient. I will elaborate more if needed. Thank you.
EDIT: For example my text is like this. I made it simple, just so it's understandable.
I went to the showroom riding a motorcycle to buy a car today. Unluckily, when I checked my smartphone, I got a message to go home.
So, my expected output would be something like this.
| topic | score |
|------------|-------|
| Vehicles | 2 |
| Electronic | 1 |
| Fruits | 0 |
EDIT2: I finally found my own solution with some help from @jezrael.
df['keywords'] = df['keywords'].str.strip('[]').str.split(', ')
text = 'I went to the showroom riding a motorcycle to buy a car today. Unluckily, when I checked my smartphone, I got a message to go home.'
score_list = []
for lex in df['keywords']:
val = 0
for w in lex:
if w in text:
val +=1
score_list.append(val)
df['score'] = score_list
print(df)
And it prints exactly what I need.
scarematchescar. Do you want that behavior or not? If not, do you wantcarsto matchcar(stemming) or is that not important?carwithcarnotcars. My actual dataset isn't in English anyway. That kind of plural as incarsdoesn't exist here.nice carandcar, yes there is a possibility for both. What do you mean bytextin expected output? My expected output would be the scores for the appearance of those words in the keyword. I hope that's clear enough?