0

I have some dates that contain multiple days that I am trying to parse. It seems that the datetime.strptime function does not support regular expressions and thus I cannot get it to ignore one day at a time. Is there an easy solution to this that I am missing?

Here are some examples:

March 20 & June 8, 2011

September 4 & 27, 2010

February 15, December 5 & 6, 2013

I know that each of these examples differ quite drastically, but I am hoping to get a solution for even one of them. An approach that can easily work across a wide range with some formatting parameter would be awesome.

Additionally, there may be cases where the date is formatted differently which I assume should be easier to handle:

7/2/2011 & 8/9/2011

1
  • Can you split the strings by & and , to do some initial parsing first? Commented Aug 21, 2018 at 2:59

5 Answers 5

1

Probably not the best way to do it, but this is my attempt:

import re

date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"
date_group = [date1,date2,date3]

for date in date_group:
    result = re.findall(r"\d{4}|[A-Z][a-z]+ \d{1,2} & \d{1,2}|[A-Z][a-z]+ \d{1,2}", date)
    year = result[-1]
    for i in range(len(result)-1):
        d = result[i].split(" ")
        try:
            d.remove("&")
        except ValueError:
            pass
        finally:
            for a in range(1,len(d)):
                date = d[0]+'{:02d}'.format(int(d[a]))+year
                time_date = datetime.strptime(date,"%B%d%Y")
                print (time_date)

Result:

2011-03-20 00:00:00
2011-06-08 00:00:00
2010-09-04 00:00:00
2010-09-27 00:00:00
2013-02-15 00:00:00
2013-12-05 00:00:00
2013-12-06 00:00:00

Basically just extract the year first and then dates. Will not work if there are multiple years though.

Sign up to request clarification or add additional context in comments.

3 Comments

the \w flag will catch _ and numbers so it could recognise something like M4r_h
I changed it to [A-Za-z]
better would be [A-Z][a-z]+ - it only allows the first character to be uppercase
1

This is one approach using datetime module

Demo:

import datetime
d1 = "March 20 & June 8, 2011"
d2 = "February 15, December 5 & 6, 2013"


def getDate(in_value):
    result = []
    in_value = in_value.split(",")
    year = in_value.pop(-1)
    for dateV in in_value:
        if "&" in dateV:
            temp = []
            val = dateV.split()
            month = val.pop(0)
            for i in val:
                if i.isdigit():
                    temp.append(datetime.datetime.strptime("{}-{}-{}".format(year, month, i).strip(), "%Y-%B-%d").strftime("%m/%d/%Y"))
            result.append(" & ".join(temp))
        else:
            result.append(datetime.datetime.strptime(dateV.strip() + year, "%B %d %Y").strftime("%m/%d/%Y"))
    return ", ".join(result)

print( getDate(d1) )    
print( getDate(d2) )

Output:

03/20/2011 & 03/08/2011
02/15/2013, 12/05/2013 & 12/06/2013

Comments

0

I would start by splitting the date strings into valid dates:

import re
def split_date(d):
    return re.split(‘[,|&]’, d)

Comments

0

All of the above answers have been good and I figured out another method that allows for multiple years:

from datetime import datetime
import re

date1 = "March 20 & June 8, 2011"
date2 = "September 4 & 27, 2010"
date3 = "February 15, December 5 & 6, 2013"


def extract_dates(date):
    dates = []
    last_index = None
    for year in re.finditer('\d{4}', date):
        if last_index is None:
            text = date[:year.span(0)[0]]
        else:
            text = date[last_index:year.span(0)[0]]
        last_index = year.span(0)[1]

        months = [match for match in re.finditer('[A-z]+', text)]
        for m, month in enumerate(months):
            if m == len(months) - 1:
                text_days = text[month.span(0)[1]:]
            else:
                text_days = text[month.span(0)[1]:months[m + 1].span(0)[0]]

            for day in re.finditer('\d{1,2}', text_days):
                dates.append(datetime.strptime(month.group(0) + ' ' + day.group(0) + ', ' + year.group(0), '%B %d, %Y'))

    return dates


print(extract_dates(date1))
print(extract_dates(date2))
print(extract_dates(date3))

Comments

0

Pyparsing is a handy Python module for parsing strings like this. Here is an annotated parser that cracks your input strings and gives months, days, and years for each:

import pyparsing as pp
import calendar

COMMA = pp.Suppress(',')
AMP = pp.Suppress('&')
DASH = pp.Suppress('-')

# use pyparsing-defined integer expression, which auto-converts parsed str's to int's
day_number = pp.pyparsing_common.integer()
# day numbers only go from 1-31
day_number.addCondition(lambda t: 1 <= t[0] <= 31)

# not in the spec, but let's support day ranges, too!
day_range = day_number("first") + DASH + day_number("last")
# parse-time conversion from "4-6" to [4, 5, 6]
day_range.addParseAction(lambda t: list(range(t.first, t.last+1)))

# this function will come in handy to build list parsers of day numbers and month-day
expr_list = lambda expr: expr + pp.ZeroOrMore(COMMA + expr) + pp.Optional(AMP + expr)

# support "10", "10 & 11", "10, 11, & 12"
day_list = expr_list(day_range | day_number)

# get the month names from the calendar module
month_name = pp.oneOf(calendar.month_name[1:])

# an expression containing a month name and a list of 1 or more day numbers
date_expr = pp.Group(month_name("month") + day_list("days"))

# use expr_list again to support multiple date_exprs separated by commas and ampersands
date_list = expr_list(date_expr)

year_number = pp.pyparsing_common.integer()
# year numbers start with 2000
year_number.addCondition(lambda t: t[0] >= 2000)

# put all together into a single parser expression
full_date = date_list("dates") + COMMA + year_number("year")

tests = """\
March 20 & June 8, 2011
September 4 & 27, 2010
February 15, December 5 & 6, 2013
September 4-6, 2010
"""

full_date.runTests(tests)

Prints:

March 20 & June 8, 2011
[['March', 20], ['June', 8], 2011]
- dates: [['March', 20], ['June', 8]]
  [0]:
    ['March', 20]
    - days: [20]
    - month: 'March'
  [1]:
    ['June', 8]
    - days: [8]
    - month: 'June'
- year: 2011


September 4 & 27, 2010
[['September', 4, 27], 2010]
- dates: [['September', 4, 27]]
  [0]:
    ['September', 4, 27]
    - days: [4, 27]
    - month: 'September'
- year: 2010


February 15, December 5 & 6, 2013
[['February', 15], ['December', 5, 6], 2013]
- dates: [['February', 15], ['December', 5, 6]]
  [0]:
    ['February', 15]
    - days: [15]
    - month: 'February'
  [1]:
    ['December', 5, 6]
    - days: [5, 6]
    - month: 'December'
- year: 2013


September 4-6, 2010
[['September', 4, 5, 6], 2010]
- dates: [['September', 4, 5, 6]]
  [0]:
    ['September', 4, 5, 6]
    - days: [4, 5, 6]
    - month: 'September'
- year: 2010

To get (year, month, day) tuples, we add another parse action and rerun the tests:

print("convert parsed fields into (year, month-name, date) tuples")
def expand_dates(t):
    return [(t.year, d.month, dy) for d in t.dates for dy in d.days]
full_date.addParseAction(expand_dates)

full_date.runTests(tests)

Prints:

convert parsed fields into (year, month-name, date) tuples

March 20 & June 8, 2011
[(2011, 'March', 20), (2011, 'June', 8)]


September 4 & 27, 2010
[(2010, 'September', 4), (2010, 'September', 27)]


February 15, December 5 & 6, 2013
[(2013, 'February', 15), (2013, 'December', 5), (2013, 'December', 6)]


September 4-6, 2010
[(2010, 'September', 4), (2010, 'September', 5), (2010, 'September', 6)]

Finally, make them into datetime.date objects with another parse action:

print("convert (year, month-name, date) tuples into datetime.date's")
# define mapping of month-name to month number 1-12
month_map = {name: num for num,name in enumerate(calendar.month_name[1:], start=1)}
from datetime import date
full_date.addParseAction(pp.tokenMap(lambda t: date(t[0], month_map[t[1]], t[2])))
full_date.runTests(tests)

Prints:

convert (year, month-name, date) tuples into datetime.date's

March 20 & June 8, 2011
[datetime.date(2011, 3, 20), datetime.date(2011, 6, 8)]


September 4 & 27, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 27)]


February 15, December 5 & 6, 2013
[datetime.date(2013, 2, 15), datetime.date(2013, 12, 5), datetime.date(2013, 12, 6)]


September 4-6, 2010
[datetime.date(2010, 9, 4), datetime.date(2010, 9, 5), datetime.date(2010, 9, 6)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.