1

I have a pandas dataframe

import pandas as pd
dt = pd.DataFrame({'id' : ['a', 'a', 'a', 'b', 'b'],
                   'col_a': [1,2,3,1,2],
                   'col_b': [2,2,[2,3],4,[2,3]]})

I would like to create a column which will assess if values col_a are in col_b.

The output dataframe should look like this:

dt = pd.DataFrame({'id' : ['a', 'a', 'a', 'b', 'b'],
                   'col_a': [1,2,3,1,2],
                   'col_b': [2,2,[2,3],4,[2,3]],
                   'exists': [0,1,1,0,1]})

How could I do that ?

2
  • 1
    If data are 'col_b': [2,2,[2,3],4,[20,30]] then last value is False or True ? You need test by rows? Commented Apr 19, 2021 at 8:41
  • 1
    it should be false then Commented Apr 19, 2021 at 8:59

2 Answers 2

4

You can use:

dt["exists"] = dt.col_a.isin(dt.col_b.explode()).astype(int)

explode the list-containing column and check if col_a isin it. Lastly cast to int.

to get

>>> dt
  id  col_a   col_b  exists
0  a      1       2       0
1  a      2       2       1
2  a      3  [2, 3]       1
3  b      1       4       0
4  b      2  [2, 3]       1 

If row-by-row comparison is required, you can use:

dt["exists"] = dt.col_a.eq(dt.col_b.explode()).groupby(level=0).any().astype(int)

which checks equality by row and if any of the (grouped) exploded values gives True, we say it exists.

Sign up to request clarification or add additional context in comments.

2 Comments

Solution is wrong if OP need test per rows, working only for sample data, try 'col_b': [2,2,[2,3],4,[20,30]],
@jezrael thank you, i added a line for strict row inclusion case.
1

Solutions if need test values per rows (it means not each value of column cola_a by all values of col_b):

You can use custom function with if-else statement:

f = lambda x: x['col_a'] in x['col_b'] 
              if isinstance(x['col_b'], list) 
              else x['col_a']== x['col_b']
dt['e'] = dt.apply(f, axis=1).astype(int)
print (dt)
  id  col_a   col_b  exists  e
0  a      1       2       0  0
1  a      2       2       1  1
2  a      3  [2, 3]       1  1
3  b      1       4       0  0
4  b      2  [2, 3]       1  1

Or DataFrame.explode with compare both columns and then test it at least one True per index values:

dt['e'] = dt.explode('col_b').eval('col_a == col_b').any(level=0).astype(int)
print (dt)
  id  col_a   col_b  exists  e
0  a      1       2       0  0
1  a      2       2       1  1
2  a      3  [2, 3]       1  1
3  b      1       4       0  0
4  b      2  [2, 3]       1  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.