1

What's the best way to get datestrings from a website using Python?

The datestrings can be, for example, in the forms of:

  • April 1st, 2011
  • April 2nd, 2011
  • April 23, 2011
  • 4/2/2011
  • 04/23/2011

Would this have to be a ton of regex? What's the most elegant solution?

3

2 Answers 2

2

Consider this lib: http://code.google.com/p/parsedatetime/

From its examples Wiki page, here are a couple of formats it can handle that look relevant to your question:

result = p.parseDateText("March 5th, 1980") 
result = p.parseDate("4/4/80") 

EDIT: now I notice it's actually a duplicate of this SO question where the same library was recommended!

Sign up to request clarification or add additional context in comments.

1 Comment

I ended up using six regex strings to find the most common date formats, but I'll give you the answer
1
    month = '(jan|feb|mar|apr|may|jun|jul|aug|sep|nov|dec)[a-z]{0,6}'
    regex_strings = ['%s(\.| )\d{1,2},? \d{2,4}' % month, # Month.Day, Year
                     '\d{1,2} %s,? \d{4}' % month, # Day Month Year(4)
                     '%s \d{1,2}\w{2},? \d{4}' % month, # Mon Day(th), Year
                     '\d{1,2} %s' % month, # Day Month
                     '\d{1,2}\.\d{1,2}\.\d{4}', # Month.Day.Year
                     '\d{1,2}/\d{1,2}/\d{2,4}', # Month/Day/Year{2,4}
                     ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.