0

I'm currently trying to match phrases like mexico 1 - 2 cameroon using regex, it matches when using regexpal to test the regex, but no match in Python using re.

My regex is:

    regex = '(mexico[\s]*\d[\s]*[-][\s]*\d[\s]*cameroon)|(mexico[\s]*\d[\s]*cameroon[\s]\d)|(mexico[\s]*\d[\s]*[-][\s]*cameroon[\s]*\d)|(cameroon[\s]*\d[\s]*[-][\s]*\d[\s]*mexico)|(cameroon[\s]*\d[\s]*mexico[\s]\d)|(cameroon[\s]*\d[\s]*[-][\s]*mexico[\s]*\d)'

and my test phrase:

testphrase = RT @remitouja: @TheJUMPsociety cameroon 1 - 1 mexico #winecup #WorldCup"

I successfully match in regexpalbut not python, but the testphrase doesn't. But the following matches in both: cameroon 1 - 1 mexico #winecup #WorldCup

Using

if re.match(regex, testtweet) is not None:
        print "Is true"

to test

3
  • Show your code please. I suspect you're using re.match while you should be using re.search. Commented Jun 13, 2014 at 16:00
  • You must use a raw string: regex = r'(mexico...' Commented Jun 13, 2014 at 16:00
  • Correct, I am using:if re.match(regex, testtweet) is not None: print "Is true" Commented Jun 13, 2014 at 16:02

1 Answer 1

4

You need to use re.search.

You are not matching because re.match by default anchors your regex at the start of the string, as if the regex you had was ^(mexico.... (etc):

if re.search(regex, testtweet) is not None:
    print "Is true"

Secondly, it's a good habit to raw your python regex strings my putting an r in front:

regex = r'(mexico[\s]*\d[\s]*[-][\s]*\d[\s]*cameroon)|(mexico[\s]*\d[\s]*cameroon[\s]\d)|(mexico[\s]*\d[\s]*[-][\s]*cameroon[\s]*\d)|(cameroon[\s]*\d[\s]*[-][\s]*\d[\s]*mexico)|(cameroon[\s]*\d[\s]*mexico[\s]\d)|(cameroon[\s]*\d[\s]*[-][\s]*mexico[\s]*\d)'

Next, you don't need to put - between square brackets, or any other character if it is alone. And having many groups might be a handful, so I believe that removing the capture groups should suffice:

regex = r'mexico\s*\d\s*-\s*\d\s*cameroon|mexico\s*\d\s*cameroon\s*\d|mexico\s*\d\s*-\s*cameroon\s*\d|cameroon\s*\d\s*-\s*\d\s*mexico|cameroon\s*\d\s*mexico\s\d|cameroon\s*\d\s*-\s*mexico\s*\d'
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect fixed it, feel a bit daft now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.