2

I would appreciate someone's help on this probably simple matter: I have a long list of words in the form ['word', 'another', 'word', 'and', 'yet', 'another']. I want to compare these words to a list that I specify, thus looking for target words whether they are contained in the first list or not.

I would like to output which of my "search" words are contained in the first list and how many times they appear. I tried something like list(set(a).intersection(set(b))) - but it splits up the words and compares letters instead.

How can I write in a list of words to compare with the existing long list? And how can I output co-occurences and their frequencies? Thank you so much for your time and help.

1
  • Can you post some code you tried. set(['word', 'another']) evaluates to set(['word', 'another']) and does not split up words to letters. Commented Mar 14, 2013 at 10:52

2 Answers 2

7
>>> lst = ['word', 'another', 'word', 'and', 'yet', 'another']
>>> search = ['word', 'and', 'but']
>>> [(w, lst.count(w)) for w in set(lst) if w in search]
[('and', 1), ('word', 2)]

This code basically iterates through the unique elements of lst, and if the element is in the search list, it adds the word, along with the number of occurences, to the resulting list.

Sign up to request clarification or add additional context in comments.

Comments

4

Preprocess your list of words with a Counter:

from collections import Counter
a = ['word', 'another', 'word', 'and', 'yet', 'another']
c = Counter(a)
# c == Counter({'word': 2, 'another': 2, 'and': 1, 'yet': 1})

Now you can iterate over your new list of words and check whether they are contained within this Counter-dictionary and the value gives you their number of appearance in the original list:

words = ['word', 'no', 'another']

for w in words:
    print w, c.get(w, 0)

which prints:

word 2
no 0
another 2

or output it in a list:

[(w, c.get(w, 0)) for w in words]
# returns [('word', 2), ('no', 0), ('another', 2)]

1 Comment

Thank you very much. Both solutions seem to do fine, but my code allows for input of the sort [('S'), ('t'), ('o'), ('c'), ('k')] when i type Stock in sys.argv(2). How am I able to input more words to a comparable list as i execute the program? and with both your suggested solutions, it compares the letters rather then the entire word conll=open(sys.argv[1],'r') targetword=str(sys.argv[2]) vocab=[] c = Counter(vocab) print c for w in targetword: print w, c.get(w, 0) print [(w, vocab.count(w)) for w in set(vocab) if w in targetword] print targetword

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.