How to get unique values of a dataframe column when there are lists - python

Question

I have the following dataframe where I would like to print the unique values of the color column.

df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow , red'], 'orange'], 'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})

Output:
           colors   names
0           green   Terry
1           green     Nor
2          purple  Franck
3  [yellow , red]    Pete
4          orange   Agnes

df.colors.unique() would work fine if there wasn't the [yellow , red] row. As it is I keep getting the TypeError: unhashable type: 'list' error which is understandable.

Is there a way to still get the unique values without taking this row into account?

I tried the followings but none worked:

df = df[~df.colors.str.contains(',', na=False)] # Nothing happens
df = df[~df.colors.str.contains('[', na=False)] # Output: error: unterminated character set at position 0
df = df[~df.colors.str.contains(']', na=False)] # Nothing happens

Ideally this should work, df.loc[~df.colors.str.contains('[', na=False, regex=False), 'colors'].unique() — Mahendra Singh
– Mahendra Singh, Commented Oct 17, 2019 at 13:40
@I.M. do you actually want the values inside the list also if they are unique or you want to ignore them? — vb_rises
– vb_rises, Commented Oct 17, 2019 at 13:42
For some reasons I also get the error: unterminated character set at position 0 @MahendraSingh — I.M.
– I.M., Commented Oct 17, 2019 at 13:42
@vb_rises I could do with ignoring them however the ideal would be to have the unique values of the column even when they are in a list format. — I.M.
– I.M., Commented Oct 17, 2019 at 13:44

jezrael · Accepted Answer · 2019-10-17 14:09:30Z

3

If values are lists check it by isinstance method:

#changed sample data
df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow' , 'red'], 'orange'], 
                   'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})

df = df[~df.colors.map(lambda x : isinstance(x, list))]
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

Your solution should be changed with casting to strings and regex=False parameter:

df = df[~df.colors.astype(str).str.contains('[', na=False, regex=False)] 
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

Also if want all unique values included lists for pandas 0.25+:

s = df.colors.map(lambda x : x if isinstance(x, list) else [x]).explode().unique().tolist()
print (s)
['green', 'purple', 'yellow', 'red', 'orange']

edited Oct 17, 2019 at 14:09

answered Oct 17, 2019 at 13:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2019-10-17 13:49:27Z

2

Let us using type

df.colors.apply(lambda x : type(x)!=list)
0     True
1     True
2     True
3    False
4     True
Name: colors, dtype: bool

answered Oct 17, 2019 at 13:49

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Yaakov Bressler · Accepted Answer · 2019-10-17 14:11:00Z

1

Assuming each of the values in your dataframe are important, here's a technique I frequently use to "unpack lists":

import re

def unlock_list_from_string(string, delim=','):
    """
    lists are stored as strings (in csv files) ex. '[1,2,3]'
    this function unlocks that list
    """
    if type(string)!=str:
        return string

    # remove brackets
    clean_string = re.sub('\[|\]', '', string)
    unlocked_string = clean_string.split(delim)
    unlocked_list = [x.strip() for x in unlocked_string]
    return unlocked_list

all_colors_nested = df['colors'].apply(unlock_list_from_string)
# unnest
all_colors = [x for y in all_colors_nested for x in y ]

print(all_colors)
# ['green', 'green', 'purple', 'yellow', 'red', 'orange']

answered Oct 17, 2019 at 14:11

Yaakov Bressler

12.7k5 gold badges66 silver badges96 bronze badges

4 Comments

I.M. Over a year ago

Your method seems to be very interesting and works really well here but I tried it on the dataframe I'm actually working with (which is a very big dataframe) and it unfortunately fails. I'll keep it for more 'normal' sized dataframe though.

Yaakov Bressler Over a year ago

What's the error you're receiving? (I use this solution on large dataframes too)

I.M. Over a year ago

The following one:

IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable `--NotebookApp.iopub_data_rate_limit`.

Yaakov Bressler Over a year ago

Ah, your dataframe is very very big. You might consider operating in chunks.

Mahendra Singh · Accepted Answer · 2019-10-18 02:44:03Z

Changes Input Sample

The input specified had a string which was a list(as specified by the poster), hence converted into a list of strings.

# Required Import
from ast import literal_eval

df = pd.DataFrame({
    'colors': ['green', 'green', 'purple', "['yellow' , 'red']", 'orange'], 
    'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']
})

Perform literal_eval. For more info check-out literal_eval

Literal eval in order to covert string to actual list only where there is a list as string

list_records = df.colors.str.contains('[', na=False, regex=False)
df.loc[list_records, 'colors'] = df.loc[list_records, 'colors'].apply(literal_eval)

Unique Colors

Works with pandas >= 0.25

df.explode('colors')['colors'].unique()

Gives

['green', 'purple', 'yellow', 'red', 'orange']

Collectives™ on Stack Overflow

How to get unique values of a dataframe column when there are lists - python

4 Answers 4

Comments

Comments

4 Comments

Changes Input Sample

Perform literal_eval. For more info check-out literal_eval

Unique Colors

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

4 Comments

Changes Input Sample

Perform literal_eval. For more info check-out literal_eval

Unique Colors

Comments

Your Answer

Sign up or log in

Post as a guest

Related