5

I have a list of strings that i would like to search for a word combination. Then delete the list if the combination is not there. Is there a python list comprehension that would work?

word_list = ["Dogs love ice cream", "Cats love balls", "Ice cream", "ice cream is good with pizza", "cats hate ice cream"]

keep_words = ["Dogs", "Cats"] 

Delete_word = ["ice cream"]

Delete words that have ice cream in it but if dogs or cats is in the sentence keep it.

Desired_output = ["Dogs love ice cream", "Cats love balls", "cats hate ice cream"] 

Was trying this code also tried AND and OR but cannot get the combination right.

output_list = [x for x in  word_list if "ice cream" not in x]
4
  • do ice cream needs to be together? Like will this sentence be considered valid:keep cream over the ice? Commented Feb 6, 2018 at 20:26
  • I was thinking of it together "ice cream" but very good point Commented Feb 6, 2018 at 20:34
  • The last one is contain cat not cats! Commented Feb 6, 2018 at 20:38
  • good spot Kasramvd cats i will edit Commented Feb 6, 2018 at 20:42

3 Answers 3

7

Here's a list comprehension solution:

[x for x in word_list if any(kw.lower() in x.lower() for kw in keep_words) 
 or all(dw.lower() not in x.lower() for dw in Delete_word)]
# ['Dogs love ice cream', 'Cats love balls', 'cats hate ice cream']

This also adds flexibility for multiple words in the delete words list.

Explanation

Iterate over the list and keep the word if either of the following are True:

  • Any of the keep words are in x
  • None of the delete words are in x

I presume from your example that you wanted this to be case insensitive, so make all comparison on the lower-cased versions of the words.

Two helpful functions are any() and all().

Sign up to request clarification or add additional context in comments.

1 Comment

Pault very interesting i didn't think about "any and all" --- Thank you so much for the help
5

As an optimized approach you can put your keep_word and delete_words within set and use itertools.filterfalse() to filter the list out:

In [48]: def key(x):
             words = x.lower().split()
             return keep_words.isdisjoint(words) or not delete_words.isdisjoint(words)
   ....: 

In [49]: keep_words = {"dogs", "cats"}

In [51]: delete_words = {"ice cream"}

In [52]: list(filterfalse(key ,word_list))
Out[52]: ['Dogs love ice cream', 'Cats love balls', 'cats hate ice cream']

4 Comments

Kasramvd a very very very interesting technique l like it - thank you for the help
return keep_words.isdisjoint(words) or Delete_word.intersection(words) could be improved not to generate the intersection like this: return keep_words.isdisjoint(words) or not Delete_word.is_disjoint(words)
@Jean-FrançoisFabre That's even better!
also x.lower() can be replaced by x.casefold() so german "beta" is lowered as double "s". more powerful than a single str.lower
1
>>> list(filter(lambda x: not any(i in x for i in Delete_word)
...                       or  any(i in x for i in keep_words), word_list))
['Dogs love ice cream', 'Cats love balls', 'Ice cream']

Modify this accordingly for a case-insensitive implementation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.