Get unique values from multiple lists in Pandas column

Question

How can I join the multiple lists in a Pandas column 'B' and get the unique values only:

   A   B 
0  10  [x50, y-1, sss00]
1  20  [x20, MN100, x50, sss00]
2  ...

Expected output:

[x50, y-1, sss00, x20, MN100]

Anurag Dabas · Accepted Answer · 2021-05-12 05:58:36Z

2

You can do this simply by list comprehension and sum() method:

result=[x for x in set(df['B'].sum())]

Now If you print result you will get your desired output:

['y-1', 'x20', 'sss00', 'x50', 'MN100']

answered May 12, 2021 at 5:58

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Don't use sum to concatenate lists. It looks fancy but it's quadratic and should be considered bad practice.

ohh....ok...btw thnx @jezrael for telling this :)

jezrael · Accepted Answer · 2021-05-12 06:08:21Z

0

If in input data are not lists, but strings first create lists:

df.B = df.B.str.strip('[]').str.split(',')

Or:

import ast
df.B = df.B.apply(ast.literal_eval)

Use Series.explode for one Series from lists with Series.unique for remove duplicates if order is important:

L = df.B.explode().unique().tolist()
#alternative
#L = df.B.explode().drop_duplicates().tolist()

print (L)
['x50', 'y-1', 'sss00', 'x20', 'MN100']

Another idea if order is not important use set comprehension with flatten lists:

L = list(set([y for x in df.B for y in x]))
print (L)
['x50', 'MN100', 'x20', 'sss00', 'y-1']

answered May 12, 2021 at 5:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Thanks jezrael, could you please briefly explain the logic?

@nilsinelabore - Added to answer.