4

I have a column having over 800 rows shown below:

0                            ['Overgrow', 'Chlorophyll']
1                            ['Overgrow', 'Chlorophyll']
2                            ['Overgrow', 'Chlorophyll']
3                               ['Blaze', 'Solar Power']
4                               ['Blaze', 'Solar Power']
5                               ['Blaze', 'Solar Power']
6                               ['Torrent', 'Rain Dish']
7                               ['Torrent', 'Rain Dish']
8                               ['Torrent', 'Rain Dish']
9                            ['Shield Dust', 'Run Away']
10                                         ['Shed Skin']
11                       ['Compoundeyes', 'Tinted Lens']
12                           ['Shield Dust', 'Run Away']
13                                         ['Shed Skin']
14                                   ['Swarm', 'Sniper']
15             ['Keen Eye', 'Tangled Feet', 'Big Pecks']
16             ['Keen Eye', 'Tangled Feet', 'Big Pecks']
17             ['Keen Eye', 'Tangled Feet', 'Big Pecks']

What do I want?

  1. I would like to count the number of times each string value has occurred.
  2. I also would like to arrange the unique string values into a list.

Here is what I have done to obtain the second part:

list_ability = df_pokemon['abilities'].tolist()
new_list = []
for i in range(0, len(list_ability)):
    m = re.findall(r"'(.*?)'", list_ability[i], re.DOTALL)
    for j in range(0, len(m)):
        new_list.append(m[j])

list1 = set(new_list)  

I am able to get the unique string values into a list, but is there a better way?

Example:

'Overgrow' - 3

'Chlorophyll' - 3

'Blaze' - 3

'Sheild Dust' - 2 .... and so on

(By the way, the name of the column is 'abilities' from the dataframe df_pokemon.)

3
  • Have you tried from collections import Counter; counts = df_pokemon.abilities.map(Counter).sum() ? Commented Oct 14, 2017 at 10:59
  • @JonClements It is returning the number of occurrences of each alphabet and special characters.. Commented Oct 14, 2017 at 11:01
  • @JonClements Is the title mentioned apt for this question? Commented Oct 14, 2017 at 11:28

2 Answers 2

6

Since the values are strings you can use regex and split to convert them to list then use itertools just the way @JonClements mentioned in comment to count i.e

from collections import Counter
count  = pd.Series(df['abilities'].str.replace('[\[\]\']','').str.split(',').map(Counter).sum())

Output:

Big Pecks        3
Chlorophyll      3
Rain Dish        3
Run Away         2
Sniper           1
Solar Power      3
Tangled Feet     3
Tinted Lens      1
Blaze            3
Compoundeyes     1
Keen Eye         3
Overgrow         3
Shed Skin        2
Shield Dust      2
Swarm            1
Torrent          3
dtype: int64
dtype: int64

For making list of only unique values then count[count==1].index.tolist()

['Sniper', 'Tinted Lens', 'Compoundeyes', 'Swarm']

For making list of the index then

count.index.tolist()
Sign up to request clarification or add additional context in comments.

4 Comments

This works like a charm. Is therea better way to do what I did for the second part of my question?
Nope. But obtaining the index of count gives the unique occurrence. Thanks anyway! By the way I hope this title is apt for this question.
Note that if you ever have , in your ability names (or escaped string delimiters or " as string delimiters because ' is used within it), then this code will split it incorrectly as opposed to the ast.literal_eval approach which'll parse it correctly as per the rules of a Python list.
@JonClements I agree but I think abilities do not contain , since its not a sentence but a phrase or verb in general. It might be list converted to string.
3

Use value_counts

In [1845]: counts = pd.Series(np.concatenate(df_pokemon.abilities)).value_counts()

In [1846]: counts
Out[1846]:
Rain Dish       3
Keen Eye        3
Chlorophyll     3
Blaze           3
Solar Power     3
Overgrow        3
Big Pecks       3
Tangled Feet    3
Torrent         3
Shield Dust     2
Shed Skin       2
Run Away        2
Compoundeyes    1
Swarm           1
Tinted Lens     1
Sniper          1
dtype: int64

For unique values you could

In [1850]: counts.index.tolist()
Out[1850]:
['Rain Dish','Keen Eye', 'Chlorophyll', 'Blaze', 'Solar Power', 'Overgrow', 
 'Big Pecks', 'Tangled Feet', 'Torrent', 'Shield Dust', 'Shed Skin', 'Run Away',
 'Compoundeyes', 'Swarm', 'Tinted Lens', 'Sniper']

Or,

In [1849]: np.unique(np.concatenate(df_pokemon.abilities))
Out[1849]:
array(['Big Pecks', 'Blaze', 'Chlorophyll', 'Compoundeyes', 'Keen Eye',
       'Overgrow', 'Rain Dish', 'Run Away', 'Shed Skin', 'Shield Dust',
       'Sniper', 'Solar Power', 'Swarm', 'Tangled Feet', 'Tinted Lens',
       'Torrent'],
      dtype='|S12')

Note - As pointed in Jon's comments if type(df_pokemon.abilities[0]) is not list then, convert to list first

import ast
df_pokemon.abilities = df_pokemon.abilities.map(ast.literal_eval)

Details

In [1842]: df_pokemon
Out[1842]:
                              abilities
0               [Overgrow, Chlorophyll]
1               [Overgrow, Chlorophyll]
2               [Overgrow, Chlorophyll]
3                  [Blaze, Solar Power]
4                  [Blaze, Solar Power]
5                  [Blaze, Solar Power]
6                  [Torrent, Rain Dish]
7                  [Torrent, Rain Dish]
8                  [Torrent, Rain Dish]
9               [Shield Dust, Run Away]
10                          [Shed Skin]
11          [Compoundeyes, Tinted Lens]
12              [Shield Dust, Run Away]
13                          [Shed Skin]
14                      [Swarm, Sniper]
15  [Keen Eye, Tangled Feet, Big Pecks]
16  [Keen Eye, Tangled Feet, Big Pecks]
17  [Keen Eye, Tangled Feet, Big Pecks]

In [1843]: df_pokemon.dtypes
Out[1843]:
abilities    object
dtype: object

In [1844]: type(df_pokemon.abilities[0])
Out[1844]: list

3 Comments

From the comment the OP has made on the post - looks like df_pokemon.abilities.map(ast.literal_eval) is needed to make them into lists first...
@Zero Let me make it clear the values in the column abilities are strings. There are no lists present. So when I type in df_pokemon['abilities'][0] it returns "['Overgrow', 'Chlorophyll']"
@JeruLuke -- do df_pokemon.abilities = df_pokemon.abilities.map(ast.literal_eval) and then what I mentioned with value_counts then.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.