2

Problem : I have following sample strings:

ex1 = "00:03:34 hello!! this is example number 1 00:04:00"
ex2 = "00:07:08 Hi I am example number 2"

I want it grouped like below (output) :

ex1 out : ("00:03:34", "hello!! this is example number 1", "00:04:00")
ex2 out : ("00:07:08", "Hi I am example number 2", None)

Tries :

I ve tried re split :

time_pat = r"(\d{2}:\d{2}:\d{2})"
re.split(time_pat, ex1)
re.split(time_pat, ex2)

it gives me following output:

ex1 out : ['', '00:03:34', ' hello!! this is example number 1 ', '00:04:00', '']
ex2 out : ['', '00:07:08', ' Hi I am example number 2']

I will get rid of blanks using filter and the output will then look like

ex1 out : ['00:03:34', ' hello!! this is example number 1 ', '00:04:00']
ex2 out : ['00:07:08', ' Hi I am example number 2']

The problem here is ex2 output will be of length 2 not 3, with the 3rd elemet as None. I know if the length is of 2, I can append None But I dont want to do that and I believe regular expression can do that.

I ve tried the following regular expressions:

re1 : r"(\d{2}:\d{2}:\d{2})(.*)(\d{2}:\d{2}:\d{2})"

as quite obvious, it will parse ex1 but not ex2

re2 : r"(\d{2}:\d{2}:\d{2})(.*)(\d{2}:\d{2}:\d{2})?"

this will parse both but 3rd string is always None since ".*" in regular expression consumes the end time pattern.

I ve tried lookahead assertion but I mite have tried it wrong thus giving no result. Can anybody help me get the regular expression here?

1
  • What's your expected output if the input is Hi I am example number 2 ? Commented Mar 28, 2015 at 3:58

2 Answers 2

3

You could use lookaheads like you suggest, or you could just use non-greedy capturing, an optional group and specify that you want to match until the end of the line ($):

import re

ex1 = "00:03:34 hello!! this is example number 1 00:04:00"
ex2 = "00:07:08 Hi I am example number 2"

for ex in [ex1, ex2]:
    mat = re.match(r'(\d{2}:\d{2}:\d{2})\s(.*?)\s*(\d{2}:\d{2}:\d{2})?$', ex)
    if mat: print mat.groups()

Output:

('00:03:34', 'hello!! this is example number 1', '00:04:00')
('00:07:08', 'Hi I am example number 2', None)

Note: This is very close to what you had -- I just used non-greedy capturing for the middle group (the ? in (.*?)) and added a $ at the end to tell it to match the entire line. Without non-greedy capturing, your optional timestamp at the end would get eaten by the middle group, and without specifying that you want to match until the end of the line, the parser wouldn't even try to match the non-greedy middle group and optional timestamp since it didn't have to.

Sign up to request clarification or add additional context in comments.

3 Comments

i suggest you to change your regex like r'^(\d{2}:\d{2}:\d{2})?\s*(.*?)\s*(\d{2}:\d{2}:\d{2})?$', because it also deals with Hi I am example number 2 input.
@AvinashRaj are you sure? I don't think I see that in the question -- an imput without a leading timestamp.
Thank you for the answer. I had tried non greedy capturing but had not used $. Good you explained.
0

use this pattern to capture instead of split

^(\d{2}:\d{2}:\d{2})(.*?)((?:\d{2}:\d{2}:\d{2})|)$

Demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.