11

Let's say I have following dataframe:

a = [[1,2,3,4,5,6],[23,23,212,223,1,12]]
b = [1,1]


df = pd.DataFrame(zip(a,b), columns = ['a', 'b'])

And my goal is to remove the elements in the lists in series A that are in series B. My attempt at doing so is below:

df['a'] = [i.remove(j) for i,j in zip(df.a, df.b)]

The logic seems sounds to me however I'm ending up with df['a'] being a series of nulls. What is going on here?

0

5 Answers 5

6

Here's an alternative way of doing it:

In []:
df2 = df.explode('a')
df['a'] = df2.a[df2.a != df2.b].groupby(level=0).apply(list)
df

Out[]:
                        a  b
0         [2, 3, 4, 5, 6]  1
1  [23, 23, 212, 223, 12]  1
Sign up to request clarification or add additional context in comments.

Comments

5

list.remove(x) removes the value in-place and returns None. That is why the above code is failing for you. You can also do something like the following.

a = [[1,2,3,4,5,6],[23,23,212,223,1,12]]
b = [1,1]
df = pd.DataFrame(zip(a,b), columns = ['a', 'b'])
for i, j in zip(df.a, df.b):
    i.remove(j)

print df

                        a  b
0         [2, 3, 4, 5, 6]  1
1  [23, 23, 212, 223, 12]  1

Comments

2

Assuming row b only contains one value, then you can try with the following using a list comprehension within a function, and then simply apply it:

import pandas as pd
a = [[1,2,3,4,5,6],[23,23,212,223,1,12]]
b = [1,1]


df = pd.DataFrame(zip(a,b), columns = ['a', 'b'])
def removing(row):
    val = [x for x in row['a'] if x != row['b']]
    return val
df['c'] = df.apply(removing,axis=1)
print(df)

Output:

                           a  b                       c
0         [1, 2, 3, 4, 5, 6]  1         [2, 3, 4, 5, 6]
1  [23, 23, 212, 223, 1, 12]  1  [23, 23, 212, 223, 12]

Comments

2

What I will do

s=pd.DataFrame(df.a.tolist(),index=df.index)
df['a']=s.mask(s.eq(df.b,0)).stack().astype(int).groupby(level=0).apply(list)
Out[264]: 
0           [2, 3, 4, 5, 6]
1    [23, 23, 212, 223, 12]
dtype: object

Comments

0

How about this:

b = [[1],[1]] 

df['a'] = df.apply(lambda row: list(set(row['a']).difference(set(row['b']))), axis=1)

b must be in this way, but you can get the difference even if you want to remove more than an element.

Example:

import pandas as pd
a = [[1,2,3,4,5,6],[23,23,212,223,1,12]]
b = [[1,5],[1,23]]


df = pd.DataFrame(zip(a,b), columns = ['a', 'b'])



df['a'] = df.apply(lambda row: list(set(row['a']).difference(set(row['b']))), axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.