split and group a string based on pattern in python

Question

Problem : I have following sample strings:

ex1 = "00:03:34 hello!! this is example number 1 00:04:00"
ex2 = "00:07:08 Hi I am example number 2"

I want it grouped like below (output) :

ex1 out : ("00:03:34", "hello!! this is example number 1", "00:04:00")
ex2 out : ("00:07:08", "Hi I am example number 2", None)

Tries :

I ve tried re split :

time_pat = r"(\d{2}:\d{2}:\d{2})"
re.split(time_pat, ex1)
re.split(time_pat, ex2)

it gives me following output:

ex1 out : ['', '00:03:34', ' hello!! this is example number 1 ', '00:04:00', '']
ex2 out : ['', '00:07:08', ' Hi I am example number 2']

I will get rid of blanks using filter and the output will then look like

ex1 out : ['00:03:34', ' hello!! this is example number 1 ', '00:04:00']
ex2 out : ['00:07:08', ' Hi I am example number 2']

The problem here is ex2 output will be of length 2 not 3, with the 3rd elemet as None. I know if the length is of 2, I can append None But I dont want to do that and I believe regular expression can do that.

I ve tried the following regular expressions:

re1 : r"(\d{2}:\d{2}:\d{2})(.*)(\d{2}:\d{2}:\d{2})"

as quite obvious, it will parse ex1 but not ex2

re2 : r"(\d{2}:\d{2}:\d{2})(.*)(\d{2}:\d{2}:\d{2})?"

this will parse both but 3rd string is always None since ".*" in regular expression consumes the end time pattern.

I ve tried lookahead assertion but I mite have tried it wrong thus giving no result. Can anybody help me get the regular expression here?

What's your expected output if the input is Hi I am example number 2 ? — Avinash Raj
– Avinash Raj, Commented Mar 28, 2015 at 3:58

jedwards · Accepted Answer · 2015-03-28 03:54:09Z

3

You could use lookaheads like you suggest, or you could just use non-greedy capturing, an optional group and specify that you want to match until the end of the line ($):

import re

ex1 = "00:03:34 hello!! this is example number 1 00:04:00"
ex2 = "00:07:08 Hi I am example number 2"

for ex in [ex1, ex2]:
    mat = re.match(r'(\d{2}:\d{2}:\d{2})\s(.*?)\s*(\d{2}:\d{2}:\d{2})?$', ex)
    if mat: print mat.groups()

Output:

('00:03:34', 'hello!! this is example number 1', '00:04:00')
('00:07:08', 'Hi I am example number 2', None)

Note: This is very close to what you had -- I just used non-greedy capturing for the middle group (the ? in (.*?)) and added a $ at the end to tell it to match the entire line. Without non-greedy capturing, your optional timestamp at the end would get eaten by the middle group, and without specifying that you want to match until the end of the line, the parser wouldn't even try to match the non-greedy middle group and optional timestamp since it didn't have to.

edited Mar 28, 2015 at 3:54

answered Mar 28, 2015 at 3:49

jedwards

30.3k3 gold badges69 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Avinash Raj Over a year ago

i suggest you to change your regex like r'^(\d{2}:\d{2}:\d{2})?\s*(.*?)\s*(\d{2}:\d{2}:\d{2})?$', because it also deals with Hi I am example number 2 input.

jedwards Over a year ago

@AvinashRaj are you sure? I don't think I see that in the question -- an imput without a leading timestamp.

Ashwin Rao Over a year ago

Thank you for the answer. I had tried non greedy capturing but had not used $. Good you explained.

alpha bravo · Accepted Answer · 2015-03-28 03:49:10Z

0

use this pattern to capture instead of split

^(\d{2}:\d{2}:\d{2})(.*?)((?:\d{2}:\d{2}:\d{2})|)$

Demo

answered Mar 28, 2015 at 3:49

alpha bravo

7,9681 gold badge24 silver badges25 bronze badges

Collectives™ on Stack Overflow

split and group a string based on pattern in python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related