regex pattern to match datetime in python

Question

I have a string contains datetimes, I am trying to split the string based on the datetime occurances,

data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"

what I am doing,

out=re.split('^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$',data)

what I get

["2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"]

What I want:

["2018-03-14 06:08:18, he went on","2018-03-15 06:08:18, lets play"]

Can there be cases when there is no whitespace between the items? Can we assume we want to split with at least 1 whitespace followed with a date? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jul 18, 2018 at 7:15
Well, I meant to suggest something like r'\s+(?=(?:(?:20)?[01]?[0-9])-(?:1[0-2]|0?[0-9])-(?:[0-2]?[0-9]|3[01]))' with split. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jul 18, 2018 at 7:20

Wiktor Stribiżew · Accepted Answer · 2018-07-18 07:37:48Z

You want to split with at least 1 whitespace followed with a date like pattern, thus, you may use

re.split(r'\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)', s)

See the regex demo

Details

\s+ - 1+ whitespace chars
(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b) - a positive lookahead that makes sure, that immediately to the left of the current location, there are
- \d{2}(?:\d{2})? - 2 or 4 digits
- - - a hyphen
- \d{1,2} - 1 or 2 digits
- -\d{1,2} - again a hyphen and 1 or 2 digits
- \b - a word boundary (if not necessary, remove it, or replace with (?!\d) in case you may have dates glued to letters or other text)

Python demo:

import re
rex = r"\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)"
s = "2018-03-14 06:08:18, he went on 2018-03-15 06:08:18, lets play"
print(re.split(rex, s))
# => ['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']

NOTE If there can be no whitespace before the date, in Python 3.7 and newer you may use r"\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)" (note the * quantifier with \s* that will allow zero-length matches). For older versions, you will need to use a solution as @blhsing suggests or install PyPi regex module and use r"(?V1)\s*(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)" with regex.split.

blhsing · Accepted Answer · 2018-07-18 07:37:27Z

4

re.split is meant for cases where you have a certain delimiter pattern. Use re.findall with a lookahead pattern instead:

import re
data="2018-03-14 06:08:18, he went on \n2018-03-15 06:08:18, lets play"
d = r'\d{4}-\d?\d-\d?\d (?:2[0-3]|[01]?[0-9]):[0-5]?[0-9]:[0-5]?[0-9]'
print(re.findall(r'{0}.*?(?=\s*{0}|$)'.format(d), data, re.DOTALL))

This outputs:

['2018-03-14 06:08:18, he went on', '2018-03-15 06:08:18, lets play']

edited Jul 18, 2018 at 7:37

answered Jul 18, 2018 at 7:18

blhsing

109k9 gold badges88 silver badges132 bronze badges

8 Comments

Wiktor Stribiżew Over a year ago

Note that a lazy dot with a lookahead might be too resource consuming since the lookahead pattern is checked after each char after the subpattern before the lazy dot. If the requirement is to split with 1 or more whitespaces that are followed with something like a date, re.split(r'\s+(?=\d{2}(?:\d{2})?-\d{1,2}-\d{1,2}\b)', s) might be a better choice.

Pyd Over a year ago

@blhsing it returns only the last occurance in my actual data

blhsing Over a year ago

@pyd I see. In case you have a '\n' in the string you just need to add an re.DOTALL flag to findall. I've updated my answer accordingly then.

Pyd Over a year ago

Thank you for the answer @blhsing

blhsing Over a year ago

@pyd You're welcome. In fact, if there's always a '\n' before each date/time, you might as well use `str.split('\n')`` to get what you want.

|

Collectives™ on Stack Overflow

regex pattern to match datetime in python

2 Answers 2

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related