0

I'm having trouble creating a new column based on columns 'language_1' and 'language_2' in python dataframe. I want to create a 'bilingual' column where a '1' represents a user who speaks both English and Spanish(bi-lingual) and a 0 for non-bilingual speakers. Ultimately I want to compare their average ratings to each other, but want to categorize them first. I tried using if statements but I'm not sure how to write an if statement that combines multiple conditions to result in 1 value. Thank you for any help.


===============================================================================================

name          language_1             language_2          rating      bilingual                                           
Kevin          English                 Null               4.25
Miguel         English                 Spanish             4.56
Carlos         English                  Spanish            4.61
Aaron          Null                     Spanish            4.33


===============================================================================================

Here is the code I've tried to use to append the new column to my dataframe.

def label_bilingual(row):
    if row('language_english') == row['English'] and row('language_spanish') == 'Spanish':
        val = 1
    else:
        val = 0

df_doc_1['bilingual'] = df_doc_1.apply(label_bilingual, axis=1)

Here is the error I'm getting.

----> 1 df_doc_1['bilingual'] = df_doc_1.apply(label_bilingual, axis=1)
'Series' object is not callable

2 Answers 2

1

You have a few issues with your function, one which is causing your error and a few more which will cause more problems after.


1 - You have tried to call the column with row('name') which is not callable.

df('row')
Traceback (most recent call last):
  File "<pyshell#30>", line 1, in <module>
    df('row')
TypeError: 'DataFrame' object is not callable

2 - You have tried to compare row['column'] to row['English'] which will not work, as a column named English does not exist

KeyError: 'English'

3 - You do not return any values

    val = 1

    val = 0

You need to modify your function as below to resolve these errors.


def label_bilingual(row):
    if row['language_1'] == 'English' and row['language_2'] == 'Spanish':
        return 1
    else:
        return 0

Output

>>> df['bilingual'] = df.apply(label_bilingual, axis=1)
>>> df
     name language_1 language_2  rating  bilingual
0   Kevin    English       Null    4.25          0
1  Miguel    English    Spanish    4.56          1
2  Carlos    English    Spanish    4.61          1
3   Aaron       Null    Spanish    4.33          0
Sign up to request clarification or add additional context in comments.

Comments

0

To make it simpler I'd suggest having missing values in either column as numpy.nan. For example if missing values were recorded as np.nan:

bilingual = np.where(np.isnan(df[['language_1', 'language_2']].values.any(), 0, 1))
df['bilingual'] = bilingual

Here np.where checks condition inside, which in turn checks whether values in either of language columns are missing. And if true, than a person is not bilingual and gets a 0, otherwise 1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.