0

Hi everyone I am executing this code in Spacy to match with Regex, but I get an error:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
doc1 = nlp("Hello hello hello, how are you?")
doc2 = nlp("Hello, how are you?")
doc3 = nlp("How are you?")
pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
matcher.add("greetings",  [pattern])
for mid, start, end in matcher(doc1):
print(start, end, doc1[start:end])

The error is

pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
                                                                                  ^
SyntaxError: invalid syntax

I am following a book called Mastering Spacy and I copy-pasted the code from the book, but I checked not to include any special characters.

Regards

3
  • It should probably be pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*","IS_PUNCT": True}] or maybe pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]}},{"OP": "*"},{"IS_PUNCT": True}] I'm voting to close this as a typo. Commented Jan 4, 2023 at 2:25
  • I recieve the following error in Python 3.10: SyntaxError: ':' expected after dictionary key. That should make it fairly obvious. Commented Jan 4, 2023 at 2:31
  • Hi, yes is a typo, as you mentioned there was a missing } here pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*"},{"IS_PUNCT": True}]. Thank you Commented Jan 4, 2023 at 2:39

1 Answer 1

2

A pattern added to the Matcher consists of a list of dictionaries.

(from docs). Your code, written more legibly:

pattern = [
    {
        "LOWER": {"IN": ["hello", "hi", "hallo"]},
        "OP": "*",
        {"IS_PUNCT": True}
    }
]

The first dictionary has three entries, but the third entry is malformed: each entry to a dictionary should consist of key: value, but you only have one item, which does not fit dictionary syntax.

Along those lines,

Each dictionary describes one token and its attributes.

Something that, lowercased, is in ["hello", "hi", "hallo"] cannot ever be punctuation. You seem to want to match something like "Hi Hi Hello!", two tokens with the first of them allowing for repetition; this would be matched by something like

pattern = [
    {
        "LOWER": {"IN": ["hello", "hi", "hallo"]},
        "OP": "*",
    },
    { "IS_PUNCT": True }
]
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.