2

I have a column in a dataframe which contains lists. I want to be able to remove elements from these lists based on elements that I have in another list (as shown below).

I tried to use list comprehension but it seems to give no result.

import pandas as pd

sys_list = ['sys1', 'sys2', 'sys3']
df = pd.DataFrame({'A':[['sys1', 'sys2', 'user1'], 
                        ['user3', 'user6', 'user1'], 
                        ['sys1', 'sys2', 'sys3']]})

df['A'] = [item for item in df['A'] if item not in sys_list]

print(df)

                       A
0    [sys1, sys2, user1]
1  [user3, user6, user1]
2     [sys1, sys2, sys3]

I need to achieve this:

                       A
0                [user1]
1  [user3, user6, user1]
2                     []

Any thoughts?

3 Answers 3

3

Use Series.apply:

df['B'] = df['A'].apply(lambda x: [item for item in x if item not in set(sys_list)])
print (df)
                       A                      B
0    [sys1, sys2, user1]                [user1]
1  [user3, user6, user1]  [user3, user6, user1]
2     [sys1, sys2, sys3]                     []

Or similar list comprehension like deleted answer:

df['B'] = [[item for item in l if item not in set(sys_list)] for l in df['A']]

Or solution with sets with set.difference:

df['B'] = df['A'].map(set(sys_list).difference).map(list)
Sign up to request clarification or add additional context in comments.

3 Comments

Surprisingly .map(set(sys_list).difference) performs slightly better. I would have expected both approaches to perform just as well
Thank you! It worked on a sample that I had provided. However, when I apply any of these solutions to my actual df I get "TypeError: 'float' object is not iterable". I have checked values and all of them are strings so I am super puzzled whit this...
@kajnwunda - I think there should be missing values, is possible remove them before by df = df.dropna(subset=['A']) ?
2

with apply:

df.A.apply(lambda x: [i for i in x if i not in sys_list])

0                  [user1]
1    [user3, user6, user1]
2                       []
Name: A, dtype: object

1 Comment

@YevhenKuzmovych " [[item for item in l if item not in sys_list] for l in df['A']] also works , please dont delete
2

You may use sets for a better performance (this approach assumes that the order within the lists is not important, as it will change):

sys_set = set(['sys1', 'sys2', 'sys3'])

df['A'] = (df.A.map(set)-sys_set).map(list)

print(df)
                    A
0                [user1]
1  [user6, user1, user3]
2                     []

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.