Pandas - Count and get unique occurrences of string values from a column

Question

I have a column having over 800 rows shown below:

0                            ['Overgrow', 'Chlorophyll']
1                            ['Overgrow', 'Chlorophyll']
2                            ['Overgrow', 'Chlorophyll']
3                               ['Blaze', 'Solar Power']
4                               ['Blaze', 'Solar Power']
5                               ['Blaze', 'Solar Power']
6                               ['Torrent', 'Rain Dish']
7                               ['Torrent', 'Rain Dish']
8                               ['Torrent', 'Rain Dish']
9                            ['Shield Dust', 'Run Away']
10                                         ['Shed Skin']
11                       ['Compoundeyes', 'Tinted Lens']
12                           ['Shield Dust', 'Run Away']
13                                         ['Shed Skin']
14                                   ['Swarm', 'Sniper']
15             ['Keen Eye', 'Tangled Feet', 'Big Pecks']
16             ['Keen Eye', 'Tangled Feet', 'Big Pecks']
17             ['Keen Eye', 'Tangled Feet', 'Big Pecks']

What do I want?

I would like to count the number of times each string value has occurred.
I also would like to arrange the unique string values into a list.

Here is what I have done to obtain the second part:

list_ability = df_pokemon['abilities'].tolist()
new_list = []
for i in range(0, len(list_ability)):
    m = re.findall(r"'(.*?)'", list_ability[i], re.DOTALL)
    for j in range(0, len(m)):
        new_list.append(m[j])

list1 = set(new_list)

I am able to get the unique string values into a list, but is there a better way?

Example:

'Overgrow' - 3

'Chlorophyll' - 3

'Blaze' - 3

'Sheild Dust' - 2 .... and so on

(By the way, the name of the column is 'abilities' from the dataframe df_pokemon.)

Have you tried from collections import Counter; counts = df_pokemon.abilities.map(Counter).sum() ? — Jon Clements
– Jon Clements, Commented Oct 14, 2017 at 10:59
@JonClements It is returning the number of occurrences of each alphabet and special characters.. — Jeru Luke
– Jeru Luke, Commented Oct 14, 2017 at 11:01

Bharath M Shetty · Accepted Answer · 2017-10-14 11:47:34Z

6

Since the values are strings you can use regex and split to convert them to list then use itertools just the way @JonClements mentioned in comment to count i.e

from collections import Counter
count  = pd.Series(df['abilities'].str.replace('[\[\]\']','').str.split(',').map(Counter).sum())

Output:

Big Pecks        3
Chlorophyll      3
Rain Dish        3
Run Away         2
Sniper           1
Solar Power      3
Tangled Feet     3
Tinted Lens      1
Blaze            3
Compoundeyes     1
Keen Eye         3
Overgrow         3
Shed Skin        2
Shield Dust      2
Swarm            1
Torrent          3
dtype: int64
dtype: int64

For making list of only unique values then count[count==1].index.tolist()

['Sniper', 'Tinted Lens', 'Compoundeyes', 'Swarm']

For making list of the index then

count.index.tolist()

edited Oct 14, 2017 at 11:47

answered Oct 14, 2017 at 11:23

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jeru Luke Over a year ago

This works like a charm. Is therea better way to do what I did for the second part of my question?

Jeru Luke Over a year ago

Nope. But obtaining the index of count gives the unique occurrence. Thanks anyway! By the way I hope this title is apt for this question.

Jon Clements Over a year ago

Note that if you ever have , in your ability names (or escaped string delimiters or " as string delimiters because ' is used within it), then this code will split it incorrectly as opposed to the ast.literal_eval approach which'll parse it correctly as per the rules of a Python list.

Bharath M Shetty Over a year ago

@JonClements I agree but I think abilities do not contain , since its not a sentence but a phrase or verb in general. It might be list converted to string.

Zero · Accepted Answer · 2017-10-14 11:06:07Z

3

Use value_counts

In [1845]: counts = pd.Series(np.concatenate(df_pokemon.abilities)).value_counts()

In [1846]: counts
Out[1846]:
Rain Dish       3
Keen Eye        3
Chlorophyll     3
Blaze           3
Solar Power     3
Overgrow        3
Big Pecks       3
Tangled Feet    3
Torrent         3
Shield Dust     2
Shed Skin       2
Run Away        2
Compoundeyes    1
Swarm           1
Tinted Lens     1
Sniper          1
dtype: int64

For unique values you could

In [1850]: counts.index.tolist()
Out[1850]:
['Rain Dish','Keen Eye', 'Chlorophyll', 'Blaze', 'Solar Power', 'Overgrow', 
 'Big Pecks', 'Tangled Feet', 'Torrent', 'Shield Dust', 'Shed Skin', 'Run Away',
 'Compoundeyes', 'Swarm', 'Tinted Lens', 'Sniper']

Or,

In [1849]: np.unique(np.concatenate(df_pokemon.abilities))
Out[1849]:
array(['Big Pecks', 'Blaze', 'Chlorophyll', 'Compoundeyes', 'Keen Eye',
       'Overgrow', 'Rain Dish', 'Run Away', 'Shed Skin', 'Shield Dust',
       'Sniper', 'Solar Power', 'Swarm', 'Tangled Feet', 'Tinted Lens',
       'Torrent'],
      dtype='|S12')

Note - As pointed in Jon's comments if type(df_pokemon.abilities[0]) is not list then, convert to list first

import ast
df_pokemon.abilities = df_pokemon.abilities.map(ast.literal_eval)

Details

In [1842]: df_pokemon
Out[1842]:
                              abilities
0               [Overgrow, Chlorophyll]
1               [Overgrow, Chlorophyll]
2               [Overgrow, Chlorophyll]
3                  [Blaze, Solar Power]
4                  [Blaze, Solar Power]
5                  [Blaze, Solar Power]
6                  [Torrent, Rain Dish]
7                  [Torrent, Rain Dish]
8                  [Torrent, Rain Dish]
9               [Shield Dust, Run Away]
10                          [Shed Skin]
11          [Compoundeyes, Tinted Lens]
12              [Shield Dust, Run Away]
13                          [Shed Skin]
14                      [Swarm, Sniper]
15  [Keen Eye, Tangled Feet, Big Pecks]
16  [Keen Eye, Tangled Feet, Big Pecks]
17  [Keen Eye, Tangled Feet, Big Pecks]

In [1843]: df_pokemon.dtypes
Out[1843]:
abilities    object
dtype: object

In [1844]: type(df_pokemon.abilities[0])
Out[1844]: list

edited Oct 14, 2017 at 11:06

answered Oct 14, 2017 at 11:02

Zero

77.4k22 gold badges153 silver badges153 bronze badges

3 Comments

Jon Clements Over a year ago

From the comment the OP has made on the post - looks like df_pokemon.abilities.map(ast.literal_eval) is needed to make them into lists first...

Jeru Luke Over a year ago

@Zero Let me make it clear the values in the column abilities are strings. There are no lists present. So when I type in df_pokemon['abilities'][0] it returns "['Overgrow', 'Chlorophyll']"

Zero Over a year ago

@JeruLuke -- do df_pokemon.abilities = df_pokemon.abilities.map(ast.literal_eval) and then what I mentioned with value_counts then.

Collectives™ on Stack Overflow

Pandas - Count and get unique occurrences of string values from a column

What do I want?

Example:

2 Answers 2

4 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

What do I want?

Example:

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related