0

I have below Python script, counts the number of words in a text file:

from collections import Counter

def main():
    with open(TEXT_FILE) as f:
        wordscounts = Counter(f.read().split())
        print(wordscounts)

Above gives me:

Counter({'invoice': 10, 'USD': 8, 'order': 5})

Now I want to add these words to another text file dictionary.txt, like:

invoice 10
USD 8
order 5

And next time I process a file, and check for word frequency, for example:

 Counter({'invoice': 2, 'USD': 1, 'tracking': 3})

It should add the count to the words already in the file, and append the new.

So dictionary.txt becomes:

invoice 12
USD 9
order 5
tracking 3

If I try to iterate through the wordscount, I only get the actual word:

 for index, wordcount in enumerate(wordscounts):
     print(wordcount)

gives me:

invoice
USD
order

But not the word count.

5
  • There are some missing steps that make it not clear what you mean by wordscounts - for example, is it a Counter object? if so then you are iterating through the object incorrectly. Also, if you are iterating a dictionary then you are also incorrectly looping. Commented May 7, 2019 at 11:44
  • I have added from collections import Counter to my quesiton. Can you elaborate on why I am looping incorrectly? Commented May 7, 2019 at 11:45
  • I see that you did clarify what the wordscounts was. But, the enumerate wordscounts does not give you the values for the counters. It is the same principle of looping a dict for keys and values. The answer below shows how one can do it. Commented May 7, 2019 at 11:46
  • 1
    A Counter is a dict subclass so you can use dictionary methods like for word, wordcount in wordscounts.items(): Commented May 7, 2019 at 11:46
  • You may want to look at stackoverflow.com/questions/28153549/… Commented May 7, 2019 at 11:59

2 Answers 2

2

You need to read the Counter dictionary. Small example

from collections import Counter
wordcount_1 = Counter("an example test test test".split())
wordcount_2 = Counter("another example test".split())

for word in wordcount_1:
    print(word, wordcount_1[word])
# example 1
# test 3
# an 1

If you want to build the sum (in memory) (as mentioned here), use

total = sum([wordcount_1, word_count_2], Counter())
Sign up to request clarification or add additional context in comments.

4 Comments

Ah, this works perfectly! Do you have any idea regarding the last part of my question? Adding the words to another file - but only updating the word count if the word already exists.
If possible, have the full word counter in memory and save afterwards. If that's not possible because of the size of your problem, I would sort the save by the words, which will fasten your search (and check if something is new).
Would you mind giving a simple example of this? I am brand new to Python, so still learning.
Added the easier in memory variant, which I would try first
1

You can get the actual wordcount using:

for index, wordcount in enumerate(wordscounts):
     print(wordscounts[wordcount])

Printing wordcount only gives you the key, while printing wordcounts[wordcount] gives you the value.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.