2

I'm trying to convert non-English languages to English using TextBlob translate function. My data set is based on Pandas data frame.

I understood that it worked in non-Pandas data frame context. For example,

what=TextBlob("El apartamento de Evan esta muy bien situado, con fcil acceso al cualquier punto de Manhattan gracias al metro.")
whatt=what.translate(to= 'en')
print (whatt)

But based on Pandas data frame, TextBlob translate wouldn't work properly.
I searched for a way to address this and found the code but gave me a different error message. Could anyone help me with this?

data["comments"] = data["comments"].str.encode('ISO 8859-1', 'ignore').apply(lambda x: TextBlob(x.strip()).translate(to='en'))

TypeError: cannot use a string pattern on a bytes-like object

1 Answer 1

2

Interesting problem

import pandas as pd
data = { 'number' : [1,2], 'comments' : ['El apartamento de Evan','Manhattan gracias al metro' ] }
df = pd.DataFrame(data)

and then lets do the translation into a new string

df["commentst"] = df["comments"].apply(lambda x: str(TextBlob(x).translate(to='en')))

and that gives

    number  comments                    commentst
0   1       El apartamento de Evan      Evan's Apartment
1   2       Manhattan gracias al metro  Manhattan thanks to the subway

And here is a minimal trial

def get_english(message):
    analysis = TextBlob(message)
    language = analysis.detect_language()
    if language == 'en':
        return message
    return str(analysis.translate(to='en'))

df["commentst"] = df["comments"].apply(lambda x: get_english(x))
df

It gives the same with mine - but I am not sure with your data

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, Thanks for your answer! I have an additional question. If some English are mixed within comments with other languages, I noticed that it also gives an error (NotTranslated: Translation API returned the input string unchanged). I tried using try&except but Python wouldn't do anything. Do you have any idea how I can address this issue?
Do you know what languages you are coming from? It helps a lot with the translation.
I checked it and mostly it's in Spanish but seems like some other languages were too (customer reviews). Can't check every rows since there're too many of them.
Thank you for the update. I ran it on the sub-sample of my data and it works fine. It only leaves English as is. But I had around 27000 reviews and ran into an error -> HTTPError: Too Many Requests . This seems like not an error from your code but from some kind of restrictions or limits on the amount of data Python can process when it uses Google Translation API?
You are implicitly calling google translate and they may have limits. I don't know about that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.