Create a new column of lists in Pandas dataframe with unique values from another column

Question

I have a dataframe with a column of lists:

    full_list_to_check
 0          NaN 
 1          NaN 
 2    [1, 2, 3, 4, 5] 
 3        [6, 6] 
 4        [11, 11]

I need to create a new column where it shows a distinct list for each row if duplicates exist in the list, otherwise just the same list.

  full_list_to_check            new_col
 0          NaN                   NaN
 1          NaN                   NaN
 2    [1, 2, 3, 4, 5]           [1, 2, 3, 4, 5]
 3        [6, 6]                  [6]
 4        [11, 11]                [11]

I have tried this:

df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)))

But I get this error:

TypeError: 'float' object is not iterable

replace your nan value with empty string, dataframe.fillna('', inplace = True). it will work. — qaiser
– qaiser, Commented Jan 15, 2020 at 11:49

Manualmsdos · Accepted Answer · 2020-01-15 12:13:31Z

4

You must check Nan:

df['full_list_to_check'].apply(lambda x: list(set(x)) if not np.any(pd.isna(x)) else np.nan)

Update:

df['full_list_to_check'].apply(lambda x: list(set(x)) if x is not np.nan else np.nan)

0                NaN
1                NaN
2    [1, 2, 3, 4, 5]
3                [6]
4               [11]

edited Jan 15, 2020 at 12:13

answered Jan 15, 2020 at 11:43

Manualmsdos

1,5473 gold badges16 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

deadvoid Over a year ago

why if not np.any(pd.isna(x)) instead of if x is not np.nan?

Mikhail Over a year ago

@deadvoid pd.isna(x) return list

deadvoid Over a year ago

@Mikhail just genuinely curious, if the purpose is to filter out NaN tho, why requires [True, True, False, ...]? I mean in this case it's all either NaN or list, if the goal is to make it more general so it can be applicable... I guess I'm just trying to wrap my head on it

deadvoid Over a year ago

@DimaFirst ah, okay.. I thought there's some reason I didn't know, like for performance or others :) but it's all good, just honestly curious

Mikhail · Accepted Answer · 2020-01-15 12:00:32Z

2

You can try this:

df['new_col'] = df.loc[~df['full_list_to_check'].isna(), 'full_list_to_check'].apply(lambda x: list(set(x)))

full_list_to_check new_col
0 NaN              NaN
1 NaN              NaN
2 [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
3 [6, 6]           [6]
4 [11, 11]         [11]

answered Jan 15, 2020 at 12:00

Mikhail

8291 gold badge11 silver badges18 bronze badges

Comments

gbruenjes · Accepted Answer · 2020-01-15 12:15:06Z

2

You could use:

df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)) if isinstance(x,list) else x)

The other answers only works if there are no other values then lists or NaN in your data.

answered Jan 15, 2020 at 12:15

gbruenjes

4,2251 gold badge18 silver badges32 bronze badges

Collectives™ on Stack Overflow

Create a new column of lists in Pandas dataframe with unique values from another column

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related