How to remove duplicates in a list of strings in a pandas column Python

Question

I am trying to remove the duplicate strings in a list of strings under a column in a Pandas DataFrame.

For example; the list value of:

[btc, btc, btc]

Should be;

[btc]

I have tried multiple methods however, none seems to be working as I am unable access the string values in the list. Any help is much appreciated.

DataFrame:

          dollar_sign  followers_count  \
0                   [btc]            35946
1                   [btc]            35946
2                   [btc]            35946
3                   [nav]            35946
4         [btc, btc, btc]            35946

Access the list of strings under a column

for row in df_twitter['dollar_sign']:
    print row

Output:

[btc]
[btc]
[btc]
[nav]
[btc, btc, btc]

Tai · Accepted Answer · 2018-04-04 22:35:12Z

3

From the information revealed, I believe OP's df is actually not full of list of strings but strings that look like a list.

From the OP's print result, we see

[btc]
[btc]
[nav]
[btc, btc,btc]

However, if it is of lists of strings, it should yield

['btc']
['btc']
['btc']
['nav']
['btc', 'btc', 'btc']

Solution:

df = pd.DataFrame({
        'dollar_sign':['[btc]','[btc]','[btc]','[nav]','[btc, btc, btc]'],
        'followers_count':[35946,35946,35946,35946,35946]}
     )


df.dollar_sign.str[1:-1].str.split(",\s").map(set)

0    {btc}
1    {btc}
2    {btc}
3    {nav}
4    {btc}
Name: dollar_sign, dtype: object

.str[1:-1] removes [ and ].
str.split(",\s") splits with ", ", a comma and a space. (Assuming the strings use ", " as the delimiter, otherwise, you may need "\s*,\s*" or something even more sophisticated.)
map(set) turns each list into a set.

edited Apr 4, 2018 at 22:35

answered Apr 4, 2018 at 22:07

Tai

8,0643 gold badges31 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mangu · Accepted Answer · 2018-04-04 14:54:17Z

3

You can use sets. A set will take out the duplicates.

So, as an example, keeping the style of the output:

for row in df_twitter['dollar_sign']:
    print list(set(row))

Output:

[btc]
[btc]
[btc]
[nav]
[btc]

answered Apr 4, 2018 at 14:54

Mangu

3,3352 gold badges29 silver badges45 bronze badges

4 Comments

AlpU Over a year ago

I think this is it! Would this update the original dataframe column to these values as well?

Mangu Over a year ago

No, other answers in this question will show you how to modify them, this is only for displaying.

AlpU Over a year ago

This didn't work - it is giving me this: [c, [, b, ], t]

uniquegino Over a year ago

This answer is not wrong, and the possible reason that you didn't get what you wanted is as Tai pointed out - what you have in each cell is not a real list, but a string that has [] in it. Otherwise Mangu's code should works well.

BENY · Accepted Answer · 2018-04-04 15:08:03Z

2

You can using list with map , and set can get the unique value

df['dollar_sign']=list(map(set,df['dollar_sign']))
df
Out[1068]: 
  dollar_sign  followers_count
0       {btc}            35946
1       {btc}            35946
2       {btc}            35946
3       {nav}            35946
4       {btc}            35946

This is how I create the df

df=pd.DataFrame({'dollar_sign':[['btc'],['btc'],['btc'],['nav'],['btc','btc','btc']],'followers_count':[35946,35946
,35946
,35946
,35946
]})

edited Apr 4, 2018 at 15:08

answered Apr 4, 2018 at 14:54

BENY

324k22 gold badges176 silver badges250 bronze badges

2 Comments

AlpU Over a year ago

It gave me the value as; {c, [, b, ], t}

AlpU Over a year ago

it is the same, but still not getting that

Ben Wilson · Accepted Answer · 2020-06-29 18:47:35Z

0

Simpler, and will turn the Series back into lists so you can stack, unstack, etc:

df['column_name'] = df['column_name'].apply(set).apply(list)

answered Jun 29, 2020 at 18:47

Ben Wilson

2,7065 gold badges31 silver badges40 bronze badges

Collectives™ on Stack Overflow

How to remove duplicates in a list of strings in a pandas column Python

4 Answers 4

Comments

4 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

4 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related