1

I've a set of string where I shall detetect the country its belongs to, referring to detected GPE.

sentences = [
    "I watched TV in germany",
    "Mediaset ITA canale 5",
    "Je viens d'Italie",
    "Ich komme aus Deutschland",
    "He is from the UK",
    "Soy de Inglaterra",
    "Sono del Regno Unito"
]

So my expectation : "I watched TV in germany" => DE "Soy de Inglaterra" => UK etc..

I've tried with this code:

from spacy.pipeline import EntityRuler
nlp = spacy.load('xx_ent_wiki_sm')  # Multilingual model


def detect_country(text):
    doc = nlp(text)
    countries = []
    for ent in doc.ents:
        if ent.label_ == 'GPE':
            countries.append(ent.text)
    return countries


for sentence in sentences:
    countries = detect_country(sentence)
    print(f"Countries detected in '{sentence}': {countries}")

But results are completely empty:

Countries detected in 'I watched TV in germany': []
Countries detected in 'Mediaset ITA canale 5': []
Countries detected in 'Je viens d'Italie': []
Countries detected in 'Ich komme aus Deutschland': []
Countries detected in 'He is from the UK': []
Countries detected in 'Soy de Inglaterra': []
Countries detected in 'Sono del Regno Unito': []

I don't understand if I'm missing something into spacy pipeline and if I'm actually take good choice for multilang model. Thx in advance for any advise

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.