1

What am i doing wrong in the below regular expression matching

>>> import re
>>> d="30-12-2001"
>>> re.findall(r"\b[1-31][/-:][1-12][/-:][1981-2011]\b",d)
[]

6 Answers 6

6

[1-31] matches 1-3 and 1 which is basically 1, 2 or 3. You cannot match a number rage unless it's a subset of 0-9. Same applies to [1981-2011] which matches exactly one character that is 0, 1, 2, 8 or 9.

The best solution is simply matching any number and then checking the numbers later using python itself. A date such as 31-02-2012 would not make any sense - and making your regex check that would be hard. Making it also handle leap years properly would make it even harder or impossible. Here's a regex matching anything that looks like a dd-mm-yyyy date: \b\d{1,2}[-/:]\d{1,2}[-/:]\d{4}\b

However, I would highly suggest not allowing any of -, : and / as : is usually used for times, / usually for the US way of writing a date (mm/dd/yyyy) and - for the ISO way (yyyy-mm-dd). The EU dd.mm.yyyy syntax is not handled at all.

If the string does not contain anything but the date, you don't need a regex at all - use strptime() instead.

All in all, tell the user what date format you expect and parse that one, rejecting anything else. Otherwise you'll get ambiguous cases such as 04/05/2012 (is it april 5th or may 4th?).

Sign up to request clarification or add additional context in comments.

1 Comment

Chiming in from Australia, we usually write dates as dd/mm/yyyy here - so / isn't always used for writing the "US format" of a date.
1

[1-31] does not means what you think it means. The square bracket syntax matches a range of characters, not a range of numbers. Matching a range of numbers with a regex is possible, but unwieldy.

If you really want to use regular expressions for this (rather than a date parsing library) you'd be better off matching all numbers of the right number of digits, capturing the values, and then checking the values yourself:

>>> import re
>>> d="30-12-2001"
>>> >>> re.findall(r"\b([0-9]{1,2})[-/:]([0-9]{1,2})[-/:]([0-9]{4})\b",d)
[('30', '12', '2001')]

You'll have to do actual date verification anyway, to catch invalid dates like 31-02-2012.

(Note that [/-:] doesn't work either, because it's interpreted as a range. Use [-/:] instead - putting the hyphen at the front prevents it being interpreted as a range separator.)

1 Comment

Point noted on the range interpretation [-/:]
1

Regular expressions do not understand numbers; to a regular expression, 1 is just a character of the string - the same kind of thing that a is. Thus, for example, [1-31] is parsed as a character class which contains the range 1-3 and the (redundant) single symbol 1.

You do not want to use regular expressions to parse dates. There is already a built-in module for handling date parsing:

>>> import datetime
>>> datetime.datetime.strptime('30-12-2001', '%d-%m-%Y')
datetime.datetime(2001, 12, 30, 0, 0) # an object representing the date.

This also does all the secondary checks (for things like an attempt to refer to Feb. 31) for you. If you want to handle multiple types of separators, you can simply .replace them in the original string so that they all turn into the same separator, then use that in your format.

Comments

1

You're probably doing it wrong. Some other replies here are helping you with the regex, but I suggest you use the datetime.strptime method to turn your formatted date into a datetime object, and do further logic with that object:

>>> import datetime
>>> datetime.strptime('30-12-2001', '%d-%m-%Y')
datetime.datetime(2001, 12, 30, 0, 0)

More info on the strptime method and it's format strings.

Comments

1

regexp = r'(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/((19|20)\d\d)'

(          #start of group #1
 0?[1-9]        #  01-09 or 1-9
 |                      #  ..or
 [12][0-9]      #  10-19 or 20-29
 |          #  ..or
 3[01]          #  30, 31
)           #end of group #1
  /         #  follow by a "/"
   (            #    start of group #2
    0?[1-9]     #   01-09 or 1-9
    |           #   ..or
    1[012]      #   10,11,12
    )           #    end of group #2
     /          #   follow by a "/"
      (         #     start of group #3
       (19|20)\\d\\d    #       19[0-9][0-9] or 20[0-9][0-9]
       )        #     end of group #3

1 Comment

So can you answer his question, what is he doing wrong? (though it has already been answered and accepted with a great explanation)
0

Maybe you can try this regex

^((0|1|2)[0-9]{1}|(3)[0-1]{1})/((0)[0-9]{1}|(1)[0-2]{1})/((19)[0-9]{2}|(20)[0-9]{2})$

this match for (01 to 31)/(01 to 12)/(1900 to 2099)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.