creating a new dataframe using boolean masks

Question

I have a dataframe containing text in a column called text and the respective language in which the text is written stored in the column lang. What I am trying to do is create a secondary dataframe containing only the text wrritten in english(so has the value en in the lang column). The dataframe also contains other values so i can't just copy it. This is what I tried :

english_only = df['lang'] == 'en'
df_2 = pd.DataFrame(df[english_only]['text'],columns = ['text','sentiment'])

When I run the code i get a dataframe of the same length as the original one but it only contains NaN values. How can I solve this ?

jezrael · Accepted Answer · 2020-05-28 11:22:51Z

1

Here DataFrame constructor is not necessary, filter by mask for boolean indexing and by columns names in list by DataFrame.loc, (solution working if df contains sentiment column):

df_2 = df.loc[english_only, ['text','sentiment']]

If want add sentiment column later:

df_2 = df.loc[english_only, ['text']]

edited May 28, 2020 at 11:22

answered May 28, 2020 at 11:17

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Luca Marinescu Over a year ago

the original doesn't have the sentiment column but I only took the text and will add that column later thanks

jezrael Over a year ago

@LucaMarinescu - added solution for this situtation

Collectives™ on Stack Overflow

creating a new dataframe using boolean masks

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related