I am trying to include brazilian CPF as entity on my NER app using spacy. The current code is the follow:
import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load("pt_core_news_sm")
text = "João mora na Bahia, 22/11/1985, seu cpf é 111.222.333-11"
ruler = nlp.add_pipe("entity_ruler")
patterns = [
{"label": "CPF", "pattern": [{"SHAPE": "ddd.ddd.ddd-dd"}]},
]
ruler.add_patterns(patterns)
doc = nlp(text)
#extract entities
for ent in doc.ents:
print (ent.text, ent.label_)
The result was only:
João PER
Bahia LOC
I tried using regex too:
{"label": "CPF", "pattern": [{"TEXT": {"REGEX": r"^\d{3}\.\d{3}\.\d{3}\-\d{2}$"}}]},
But not worked too
How can I fix that to retrieve CPF?