1

I'm very new to Python and do know that my question is very simple but I've not found an existed question on SO yet.

I have an array contains string elements. Now I want to extract elements and count the number of appearances of them, them sort in descending order.

For example:

['ab' 'ab' 'ac']

then the output should be:

'ab' 2
'ac' 1

Also, it's bad of me that I don't know what is the best way to store my output (in a map, hash... or something like that? Again, I'm not sure)...

Thanks for any help.

1
  • 1
    Incidentally, this isn't an array, it's alist, or more generally a "sequence". In python, array refers to a specific data type. Commented Jul 5, 2012 at 19:33

3 Answers 3

3

This can be done using the Counter class from the collections module.

from collections import Counter
x = ['ab', 'ab', 'ac']
counts = Counter(x)

counts stores the count information for each element; the full list of methods can be found in the documentation, but probably all you care about is that you can access counts directly by treating counts like a hash:

counts['ab']
>>> 2
Sign up to request clarification or add additional context in comments.

4 Comments

thanks and +1, but then how can I sort by frequency in descending order?
The most_common method will do this. counts.most_common() gives a list ordered from most frequent to least frequent of tuples of the form (elem,count). You could iterate over this with e.g. for elem, count in counts.most_common():.
thank you. I will accept this answer. By the way, what is the data type of counts?
It's a Counter, a specific class from the collections module. You can read about its methods in the documentation I linked. But you can treat it like a dict in many contexts (which is the Python version of a map or hash).
1

This is a classic problem, the so called "Word Count" problem. You would probably want to use a dictionary, python's built in amortized linear lookup type.

Declared like such:

dict = {}

You can then iterate over your list of tokens with a loop body resembling the following:

if token not in dict:
    dict[token] = 1
else
    dict[token] += 1

When you're done, you end up with a dictionary containing words as keys and frequencies as values.

The following documentation is relevant: http://docs.python.org/release/2.5.2/lib/typesmapping.html

Comments

1

There is some library called NLTK. Link - http://nltk.org/.

EDIT: I found something better:

You can look here too - real word count in NLTK.

Code example from the above link:

    from collections import Counter
    >>> text = ['this', 'is', 'a', 'sentence', '.']
    >>> counts = Counter(filtered)
    >>> counts
    Counter({'this': 1, 'a': 1, 'is': 1, 'sentence': 1})

1 Comment

sincerely I'm working with some NLP stuffs but it would be better if you go more in details... :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.