Python: Shortest way to extract and count elements from an array of String?

Question

I'm very new to Python and do know that my question is very simple but I've not found an existed question on SO yet.

I have an array contains string elements. Now I want to extract elements and count the number of appearances of them, them sort in descending order.

For example:

['ab' 'ab' 'ac']

then the output should be:

'ab' 2
'ac' 1

Also, it's bad of me that I don't know what is the best way to store my output (in a map, hash... or something like that? Again, I'm not sure)...

Thanks for any help.

Incidentally, this isn't an array, it's alist, or more generally a "sequence". In python, array refers to a specific data type. — Joel Cornett
– Joel Cornett, Commented Jul 5, 2012 at 19:33

bnaul · Accepted Answer · 2012-07-05 19:31:09Z

3

This can be done using the Counter class from the collections module.

from collections import Counter
x = ['ab', 'ab', 'ac']
counts = Counter(x)

counts stores the count information for each element; the full list of methods can be found in the documentation, but probably all you care about is that you can access counts directly by treating counts like a hash:

counts['ab']
>>> 2

answered Jul 5, 2012 at 19:31

bnaul

17.7k4 gold badges34 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Thiem Nguyen Over a year ago

thanks and +1, but then how can I sort by frequency in descending order?

bnaul Over a year ago

The most_common method will do this. counts.most_common() gives a list ordered from most frequent to least frequent of tuples of the form (elem,count). You could iterate over this with e.g. for elem, count in counts.most_common():.

Thiem Nguyen Over a year ago

thank you. I will accept this answer. By the way, what is the data type of counts?

bnaul Over a year ago

It's a Counter, a specific class from the collections module. You can read about its methods in the documentation I linked. But you can treat it like a dict in many contexts (which is the Python version of a map or hash).

Wug · Accepted Answer · 2012-07-05 19:32:47Z

1

This is a classic problem, the so called "Word Count" problem. You would probably want to use a dictionary, python's built in amortized linear lookup type.

Declared like such:

dict = {}

You can then iterate over your list of tokens with a loop body resembling the following:

if token not in dict:
    dict[token] = 1
else
    dict[token] += 1

When you're done, you end up with a dictionary containing words as keys and frequencies as values.

The following documentation is relevant: http://docs.python.org/release/2.5.2/lib/typesmapping.html

answered Jul 5, 2012 at 19:32

Wug

13.3k5 gold badges36 silver badges57 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:04:43Z

1

There is some library called NLTK. Link - http://nltk.org/.

EDIT: I found something better:

You can look here too - real word count in NLTK.

Code example from the above link:

    from collections import Counter
    >>> text = ['this', 'is', 'a', 'sentence', '.']
    >>> counts = Counter(filtered)
    >>> counts
    Counter({'this': 1, 'a': 1, 'is': 1, 'sentence': 1})

edited May 23, 2017 at 12:04

CommunityBot

11 silver badge

answered Jul 5, 2012 at 19:28

barak1412

1,1709 silver badges21 bronze badges

1 Comment

Thiem Nguyen Over a year ago

sincerely I'm working with some NLP stuffs but it would be better if you go more in details... :)

Collectives™ on Stack Overflow

Python: Shortest way to extract and count elements from an array of String?

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related