1

In Rubular, I have created a regular expression:

(Prerequisite|Recommended): (\w|-| )*

It matches the bolded:

Recommended: good comfort level with computers and some of the arts.

Summer. 2 credits. Prerequisite: pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.

Here is a use of the regex in Python:

note_re = re.compile(r'(Prerequisite|Recommended): (\w|-| )*', re.IGNORECASE)

def prereqs_of_note(note):
    match = note_re.match(note)
    if not match:
        return None
    return match.group(0) 

Unfortunately, the code returns None instead of a match:

>>> import prereqs

>>> result  = prereqs.prereqs_of_note("Summer. 2 credits. Prerequisite: pre-fres
hman standing or permission of instructor. Credit may not be applied toward engi
neering degree. S-U grades only.")

>>> print result
None

What am I doing wrong here?

UPDATE: Do I need re.search() instead of re.match()?

2
  • 2
    pythex.org says that regular expression matches that string even using Python's engine, so the problem is with how you're using the regex (I don't know Python) Commented May 9, 2010 at 22:32
  • 1
    Also, personally I'd update your regex to (Prerequisite|Recommended): ([\w -]*) so you can better capture the rest of the match. (See rubular.com/r/5v7u66vc1M) Commented May 9, 2010 at 22:35

1 Answer 1

2

You want to use re.search() because it scans the string. You don't want re.match() because it tries to apply the pattern at the start of the string.

>>> import re
>>> s = """Summer. 2 credits. Prerequisite: pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only."""
>>> note_re = re.compile(r'(Prerequisite|Recommended): ([\w -]*)', re.IGNORECASE)
>>> note_re.search(s).groups()
('Prerequisite', 'pre-freshman standing or permission of instructor')

Also, if you want to match past the first period following the word "instructor" you're going to have to add a literal '.' into your pattern:

>>> re.search(r'(Prerequisite|Recommended): ([\w -\.]*)', s, re.IGNORECASE).groups()
('Prerequisite', 'pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.')

I would suggest you make your pattern greedier and match on the rest of the line, unless that's not really what you want, although it seems like you do.

>>> re.search(r'(Prerequisite|Recommended): (.*)', s, re.IGNORECASE).groups()
('Prerequisite', 'pre-freshman standing or permission of instructor. Credit may not be applied toward engineering degree. S-U grades only.')

The previous pattern with the addition of literal '.', returns the same as .* for this example.

Sign up to request clarification or add additional context in comments.

1 Comment

...or maybe (.*?\.) to match only up to the first period.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.