Problem omitting optional word in python3 regex

Question

I need a regex that captures 2 groups: a movie and the year. Optionally, there could be a 'from ' string between them.

My expected results are:

first_query="matrix 2013" => ('matrix', '2013')
second_query="matrix from 2013" => ('matrix', '2013')
third_query="matrix" => ('matrix', None)

I've done 2 simulations on https://regex101.com/ for python3: I- r"(.+)(?:from ){0,1}([1-2]\d{3})" Doesn't match first_query and third_query, also doesn't omit 'from' in group one, which is what I want to avoid.

II- r"(.+)(?:from ){1}([1-2]\d{3})" Works with second_query, but does not match first_query and third_query.

Is it possible to match all three strings, omitting the 'from ' string from the first group?

Thanks in advance.

Wiktor Stribiżew · Accepted Answer · 2018-11-26 17:03:05Z

3

You may use

^(.+?)(?:\s+(?:from\s+)?([12]\d{3}))?$

See the regex demo

Details

^ - start of a string
(.+?) - Group 1: any 1+ chars other than line break chars, as few as possible
(?:\s+(?:from\s+)?([12]\d{3}))? - an optional non-capturing group matching 1 or 0 occurrences of:
- \s+ - 1+ whitespaces
- (?:from\s+)? - an optional sequence of from substring followed with 1+ whitespaces
- ([12]\d{3}) - Group 2: 1 or 2 followed with 3 digits
$ - end of string.

answered Nov 26, 2018 at 17:03

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

staticdev Over a year ago

Thank you for the clarification!

Patrick Artner · Accepted Answer · 2018-11-26 17:05:24Z

2

This will output your patters, but have a space too much in from of the number:

import re

pat = r"^(.+?)(?: from)? ?(\d+)?$"


text = """matrix 2013
matrix from 2013
matrix"""

for t in text.split("\n"):
    print(re.findall(pat,t))

Output:

[('matrix', '2013')]
[('matrix', '2013')]
[('matrix', '')]

Explanation:

 ^           start of string
(.+?)        lazy anythings as few as possible
(?: from)?   non-grouped optional ` from`
 ?           optional space
(\d+=)?$     optional digits till end of string

Demo: https://regex101.com/r/VD0SZb/1

answered Nov 26, 2018 at 17:05

Patrick Artner

51.9k10 gold badges50 silver badges79 bronze badges

Comments

jez · Accepted Answer · 2018-11-26 17:11:15Z

1

import re

pattern = re.compile( r"""
    ^\s*              # start of string (optional whitespace)
    (?P<title>\S+)    # one or more non-whitespace characters (title)
    (?:\s+from)?      # optionally, some space followed by the word 'from'
    \s*               # optional whitespace
    (?P<year>[0-9]+)? # optional digit string (year)
    \s*$              # end of string (optional whitespace)
""", re.VERBOSE )

for query in [ 'matrix 2013', 'matrix from 2013', 'matrix' ]:
    m = re.match( pattern, query )
    if m: print( m.groupdict() )

# Prints:
# {'title': 'matrix', 'year': '2013'}
# {'title': 'matrix', 'year': '2013'}
# {'title': 'matrix', 'year': None}

Disclaimer: this regex does not contain the logic necessary to reject the first two matches on the grounds that The Matrix actually came out in 1999.

answered Nov 26, 2018 at 17:11

jez

15.5k6 gold badges43 silver badges77 bronze badges

1 Comment

staticdev Over a year ago

Also a nice solution! It is a good way to document regex code by breaking it in lines.

Collectives™ on Stack Overflow

Problem omitting optional word in python3 regex

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related