Create a True/False column in Python DataFrame(1,0) based on two columns values

Question

I'm having trouble creating a new column based on columns 'language_1' and 'language_2' in python dataframe. I want to create a 'bilingual' column where a '1' represents a user who speaks both English and Spanish(bi-lingual) and a 0 for non-bilingual speakers. Ultimately I want to compare their average ratings to each other, but want to categorize them first. I tried using if statements but I'm not sure how to write an if statement that combines multiple conditions to result in 1 value. Thank you for any help.


===============================================================================================

name          language_1             language_2          rating      bilingual                                           
Kevin          English                 Null               4.25
Miguel         English                 Spanish             4.56
Carlos         English                  Spanish            4.61
Aaron          Null                     Spanish            4.33


===============================================================================================

Here is the code I've tried to use to append the new column to my dataframe.

def label_bilingual(row):
    if row('language_english') == row['English'] and row('language_spanish') == 'Spanish':
        val = 1
    else:
        val = 0

df_doc_1['bilingual'] = df_doc_1.apply(label_bilingual, axis=1)

Here is the error I'm getting.

----> 1 df_doc_1['bilingual'] = df_doc_1.apply(label_bilingual, axis=1)
'Series' object is not callable

PacketLoss · Accepted Answer · 2021-03-15 01:08:41Z

You have a few issues with your function, one which is causing your error and a few more which will cause more problems after.

1 - You have tried to call the column with row('name') which is not callable.

df('row')
Traceback (most recent call last):
  File "<pyshell#30>", line 1, in <module>
    df('row')
TypeError: 'DataFrame' object is not callable

2 - You have tried to compare row['column'] to row['English'] which will not work, as a column named English does not exist

KeyError: 'English'

3 - You do not return any values

    val = 1

    val = 0

You need to modify your function as below to resolve these errors.

def label_bilingual(row):
    if row['language_1'] == 'English' and row['language_2'] == 'Spanish':
        return 1
    else:
        return 0

Output

>>> df['bilingual'] = df.apply(label_bilingual, axis=1)
>>> df
     name language_1 language_2  rating  bilingual
0   Kevin    English       Null    4.25          0
1  Miguel    English    Spanish    4.56          1
2  Carlos    English    Spanish    4.61          1
3   Aaron       Null    Spanish    4.33          0

NotAName · Accepted Answer · 2021-03-15 00:57:09Z

0

To make it simpler I'd suggest having missing values in either column as numpy.nan. For example if missing values were recorded as np.nan:

bilingual = np.where(np.isnan(df[['language_1', 'language_2']].values.any(), 0, 1))
df['bilingual'] = bilingual

Here np.where checks condition inside, which in turn checks whether values in either of language columns are missing. And if true, than a person is not bilingual and gets a 0, otherwise 1.

answered Mar 15, 2021 at 0:57

NotAName

4,4744 gold badges39 silver badges59 bronze badges

Collectives™ on Stack Overflow

Create a True/False column in Python DataFrame(1,0) based on two columns values

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related