Extracting dates from a string in python

Question

I have a string as

 fmt_string2 = I want to apply for leaves from 12/12/2017 to 12/18/2017

Here I want to extract the following dates. But my code needs to be robust as this can be in any format it can be 12 January 2017 or 12 Jan 17. and its position can also change. For the above code I have tried doing:

''.join(fmt_string2.split()[-1].split('.')[::-10])

But here I am giving position of my date. Which I dont want. Can anyone help in making a robust code for extracting dates.

See the third party library, dateparser

Open AI - Opting Out
– Open AI - Opting Out

2017-07-11 06:18:16 +00:00
Commented Jul 11, 2017 at 6:18 — Open AI - Opting Out
– Open AI - Opting Out, Commented Jul 11, 2017 at 6:18
Possible duplicate of Python - finding date in a string

acsrujan
– acsrujan

2017-07-11 06:21:03 +00:00
Commented Jul 11, 2017 at 6:21 — acsrujan
– acsrujan, Commented Jul 11, 2017 at 6:21
i tried dateparser but its not helping in this case

Geetanjali Bisht
– Geetanjali Bisht

2017-07-11 06:31:31 +00:00
Commented Jul 11, 2017 at 6:31 — Geetanjali Bisht
– Geetanjali Bisht, Commented Jul 11, 2017 at 6:31

arif · Accepted Answer · 2017-07-11 06:35:20Z

12

If 12/12/2017, 12 January 2017, and 12 Jan 17 are the only possible patterns then the following code that uses regex should be enough.

import re

string = 'I want to apply for leaves from 12/12/2017 to 12/18/2017 I want to apply for leaves from 12 January 2017 to ' \
       '12/18/2017 I want to apply for leaves from 12/12/2017 to 12 Jan 17 '

matches = re.findall('(\d{2}[\/ ](\d{2}|January|Jan|February|Feb|March|Mar|April|Apr|May|May|June|Jun|July|Jul|August|Aug|September|Sep|October|Oct|November|Nov|December|Dec)[\/ ]\d{2,4})', string)

for match in matches:
    print(match[0])

Output:

12/12/2017
12/18/2017
12 January 2017
12/18/2017
12/12/2017
12 Jan 17

To understand the regex play with it hare in regex101.

answered Jul 11, 2017 at 6:35

arif

5248 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ayush Vatsyayan · Accepted Answer · 2017-07-14 09:15:28Z

5

Using Regular Expressions

Rather than going through regex completely, I suggest the following approach:

import re
from dateutil.parser import parse

Sample Text

text = """
I want to apply for leaves from 12/12/2017 to 12/18/2017
then later from 12 January 2018 to 18 January 2018
then lastly from 12 Feb 2018 to 18 Feb 2018
"""

Regular expression to find anything that is of form "from A to B". Advantage here will be that I don't have to take care of each and every case and keep building my regex. Rather this is dynamic.

pattern = re.compile(r'from (.*) to (.*)')    
matches = re.findall(pattern, text)

Pattern from above regex for the text is

[('12/12/2017', '12/18/2017'), ('12 January 2018', '18 January 2018'), ('12 Feb 2018', '18 Feb 2018')]

For each match I parse the date. Exception is thrown for value that isn't date, hence in except block we pass.

for val in matches:
    try:
        dt_from = parse(val[0])
        dt_to = parse(val[1])

        print("Leave applied from", dt_from.strftime('%d/%b/%Y'), "to", dt_to.strftime('%d/%b/%Y'))
    except ValueError:
        print("skipping", val)

Output:

Leave applied from 12/Dec/2017 to 18/Dec/2017
Leave applied from 12/Jan/2018 to 18/Jan/2018
Leave applied from 12/Feb/2018 to 18/Feb/2018

Using pyparsing

Using regular expressions has the limitation that it might end up being very complex in order to make it more dynamic for handling not so straightforward input for e.g.

text = """
I want to apply for leaves from start 12/12/2017 to end date 12/18/2017 some random text
then later from 12 January 2018 to 18 January 2018 some random text
then lastly from 12 Feb 2018 to 18 Feb 2018 some random text
"""

So, Pyton's pyparsing module is the best fit here.

import pyparsing as pp

Here approach is to create a dictionary that can parse the entire text.

Create keywords for month names that can be used as pyparsing keyword

months_list= []
for month_idx in range(1, 13):
    months_list.append(calendar.month_name[month_idx])
    months_list.append(calendar.month_abbr[month_idx])

# join the list to use it as pyparsing keyword
month_keywords = " ".join(months_list)

Dictionary for parsing:

# date separator - can be one of '/', '.', or ' '
separator = pp.Word("/. ")

# Dictionary for numeric date e.g. 12/12/2018
numeric_date = pp.Combine(pp.Word(pp.nums, max=2) + separator + pp.Word(pp.nums, max=2) + separator + pp.Word(pp.nums, max=4))

# Dictionary for text date e.g. 12/Jan/2018
text_date = pp.Combine(pp.Word(pp.nums, max=2) + separator + pp.oneOf(month_keywords) + separator + pp.Word(pp.nums, max=4))

# Either numeric or text date
date_pattern = numeric_date | text_date

# Final dictionary - from x to y
pattern = pp.Suppress(pp.SkipTo("from") + pp.Word("from") + pp.Optional("start") + pp.Optional("date")) + date_pattern
pattern += pp.Suppress(pp.Word("to") + pp.Optional("end") + pp.Optional("date")) + date_pattern

# Group the pattern, also it can be multiple
pattern = pp.OneOrMore(pp.Group(pattern))

Parse the input text:

result = pattern.parseString(text)

# Print result
for match in result:
    print("from", match[0], "to", match[1])

Output:

from 12/12/2017 to 12/18/2017
from 12 January 2018 to 18 January 2018
from 12 Feb 2018 to 18 Feb 2018

edited Jul 14, 2017 at 9:15

answered Jul 11, 2017 at 8:07

Ayush Vatsyayan

2,64626 silver badges30 bronze badges

6 Comments

Geetanjali Bisht Over a year ago

Here in the above code if I don't get a value ex: 18/Dec/2017. So I want none in that position but it is giving me ' '.

Ayush Vatsyayan Over a year ago

@GeetanjaliBisht Please elaborate more. What I understand you are getting blank string instead of None in that case you can always convert it in python.

Geetanjali Bisht Over a year ago

If I am not giving dt_to.strftime I am getting this: ('skipping', ('12/12/2017', '')). Instead of this I want this: ('12/12/2017', 'none')). I tried doing this but I am not being able to

Ayush Vatsyayan Over a year ago

That won't be coming in from the regex, as it's matching whatever is there in the text. Now if you want none you can place check in the except ValueError: print("skipping", val). Here in the except block, you can write a if condition that will replace '' with 'none' e.g. if val[1] == '': val[1] = 'none'

Geetanjali Bisht Over a year ago

In the following code if I give 'from date 12/12/2017 to end date 12/18/2017, then it will take date and end date in my output but I only want date. Can you tell me how can I solve this?

|

Collectives™ on Stack Overflow

Extracting dates from a string in python

2 Answers 2

Comments

Using Regular Expressions

Using pyparsing

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Using Regular Expressions

Using pyparsing

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related