2

I would like to assign a binary value (1 or 0) whether a column contains not empty/empty lists.

For example:

Country       Test
Germany        []
Italy         ['pizza']
United Kingdom ['queen', 'king','big']
France        ['Eiffel']
Spain         []

...

What I would expect is something like this:

Country       Test            Binary
Germany        []               0
Italy         ['pizza']         1
United Kingdom ['queen', 'king','big']    1
France        ['Eiffel']        1
Spain         []                0

...

I do not know how to use np.where or another to get these results.
I think to check if a column contains an empty list I should do something like this: df[df['Test'] != '[]']

7
  • 1
    df['Binary'] = (df['Test'] != []).astype(int) Commented Sep 28, 2020 at 1:40
  • getting this error: ValueError: Lengths must match to compare Commented Sep 28, 2020 at 1:44
  • try df['Binary'] = (df['Test'].neq([])).astype(int) then Commented Sep 28, 2020 at 1:46
  • 1
    Finally, a working solution: df['Test'].astype(bool).astype(int) Commented Sep 28, 2020 at 1:49
  • 1
    df['Binary'] = (df['Test'].str.len() != 0).astype(int) worked for me. Commented Sep 28, 2020 at 2:04

2 Answers 2

1

You can do a simple check for length and based on the value, you can convert it to 0 or 1.

df['Binary'] = (df['Test'].str.len() != 0).astype(int)

While this is good, the most efficient way to do it was provided by @Marat.

df['Binary'] = df['Test'].astype(bool).astype(int)

The full code is here:

import pandas as pd
c = ['Country','Test']
d = [['Germany',[]],
['Italy',['pizza']],
['United Kingdom', ['queen', 'king','big']],
['France',['Eiffel']],
['Spain',[]]]

df = pd.DataFrame(data=d,columns=c)
df['Binary'] = df['Test'].astype(bool).astype(int)
print (df)

The output of this will be:

          Country                Test  Binary
0         Germany                  []       0
1           Italy             [pizza]       1
2  United Kingdom  [queen, king, big]       1
3          France            [Eiffel]       1
4           Spain                  []       0
Sign up to request clarification or add additional context in comments.

Comments

0

Use str.len:

np.clip(df.Test.str.len(), 0, 1)
#or
(df.Test.str.len()==0).astype(int)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.