2

I have a dataframe with a column of lists:

    full_list_to_check
 0          NaN 
 1          NaN 
 2    [1, 2, 3, 4, 5] 
 3        [6, 6] 
 4        [11, 11] 

I need to create a new column where it shows a distinct list for each row if duplicates exist in the list, otherwise just the same list.

  full_list_to_check            new_col
 0          NaN                   NaN
 1          NaN                   NaN
 2    [1, 2, 3, 4, 5]           [1, 2, 3, 4, 5]
 3        [6, 6]                  [6]
 4        [11, 11]                [11]

I have tried this:

df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)))

But I get this error:

TypeError: 'float' object is not iterable
1
  • 1
    replace your nan value with empty string, dataframe.fillna('', inplace = True). it will work. Commented Jan 15, 2020 at 11:49

3 Answers 3

4

You must check Nan:

df['full_list_to_check'].apply(lambda x: list(set(x)) if not np.any(pd.isna(x)) else np.nan)

Update:

df['full_list_to_check'].apply(lambda x: list(set(x)) if x is not np.nan else np.nan)
0                NaN
1                NaN
2    [1, 2, 3, 4, 5]
3                [6]
4               [11]
Sign up to request clarification or add additional context in comments.

4 Comments

why if not np.any(pd.isna(x)) instead of if x is not np.nan?
@deadvoid pd.isna(x) return list
@Mikhail just genuinely curious, if the purpose is to filter out NaN tho, why requires [True, True, False, ...]? I mean in this case it's all either NaN or list, if the goal is to make it more general so it can be applicable... I guess I'm just trying to wrap my head on it
@DimaFirst ah, okay.. I thought there's some reason I didn't know, like for performance or others :) but it's all good, just honestly curious
2

You can try this:

df['new_col'] = df.loc[~df['full_list_to_check'].isna(), 'full_list_to_check'].apply(lambda x: list(set(x)))
full_list_to_check new_col
0 NaN              NaN
1 NaN              NaN
2 [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
3 [6, 6]           [6]
4 [11, 11]         [11]

Comments

2

You could use:

df['new_col'] = df['full_list_to_check'].apply(lambda x: list(set(x)) if isinstance(x,list) else x)

The other answers only works if there are no other values then lists or NaN in your data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.