1

I am trying to write code that clean up dates in different date formats (such as 3/14/2015, 03-14-2015, and 2015/3/14) by replacing them with dates in a single, standard format. So far I have wrote my regex expression but it's not working the way I would like.

import pyperclip,re

dateRegex = re.compile(r'''
    (\d|\d{2}|\d{4})  # match 1 digit, or two digits, or four digits
    (\s|-|\.|\/) # match either a space or a dash or a period or a backslash
    (\d{2}|\d) # match either 2 digits or one
    (\s|-|\.\/) # match either a space or a dash or a period or a backslash
    (\d{4}|\d{2}) # match either 4 or 2 digits.
    ''',)

text = "12/25/0000, 10.21.1955, 10-21-1985 6-5-1995 2004/2/21 5/25/2111 4999.2.21 "
a = dateRegex.findall(text):

Any idea why this isn't working?

3
  • 5
    Have you considered just using e.g. python-dateutil? Also it's "digits", FYI. Commented Jul 13, 2016 at 13:54
  • Related: stackoverflow.com/q/7048828/189134 Commented Jul 13, 2016 at 13:54
  • you have random syntax things wrong with this code, such as a trailing comma after your pattern and a colon after the findall. I hope these are all just mistakes made when copying the code. Likewise, I hope those comments for each line are just for SO demonstrative purposes and not actually in your code because those wouldn't be parsed as comments but rather part of the pattern. Commented Jul 13, 2016 at 13:58

1 Answer 1

1

This code works (see live):

import re
p = re.compile(ur'''(\d|\d{2}|\d{4})  # match 1 didget, or two didgets, or four didgets
                    ([-\s./]) # match either a space or a dash or a period or a backslash
                    (\d{1,2}) # match either 2 digets or one
                    ([-\s./]) # match either a space or a dash or a period or a backslash
                    (\d{4}|\d{2}) # match either 4 or 2 didgets.''', re.VERBOSE)
test_str = u"12/25/0000, 10.21.1955, 10-21-1985 6-5-1995 2004/2/21 5/25/2111 4999.2.21 "

print(p.findall(test_str))

You forget the option re.VERBOSE which means:

Spaces and text after a # in the pattern are ignored

Sign up to request clarification or add additional context in comments.

4 Comments

Awesome. Thanks for the help
Any idea why it's only printing dates with a "-" and not printing the other formatted dates?
\d|\d{2}|\d{4} => \d(?:\d(?:\d{2})?)? and \d{4}|\d{2} => \d{2}(?:\d{2})?. Isn't it better like this?
@CasimiretHippolyte it indeed works too but I find it harder to read

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.