1

I have a dataframe containing text in a column called text and the respective language in which the text is written stored in the column lang. What I am trying to do is create a secondary dataframe containing only the text wrritten in english(so has the value en in the lang column). The dataframe also contains other values so i can't just copy it. This is what I tried :

english_only = df['lang'] == 'en'
df_2 = pd.DataFrame(df[english_only]['text'],columns = ['text','sentiment'])

When I run the code i get a dataframe of the same length as the original one but it only contains NaN values. How can I solve this ?

1 Answer 1

1

Here DataFrame constructor is not necessary, filter by mask for boolean indexing and by columns names in list by DataFrame.loc, (solution working if df contains sentiment column):

df_2 = df.loc[english_only, ['text','sentiment']]

If want add sentiment column later:

df_2 = df.loc[english_only, ['text']]
Sign up to request clarification or add additional context in comments.

2 Comments

the original doesn't have the sentiment column but I only took the text and will add that column later thanks
@LucaMarinescu - added solution for this situtation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.