1

Assume I have a string as follows:

2021/12/23 13:00 14:00 2021/12/24 13:00 14:00 15:00

Where a date comes with time several times. Is it possible that regular expression can find all time after each date such as follows?

[('2021/12/23', '13:00','14:00'), ('2021/12/24', '13:00','14:00','15:00')]

I tried the following code in Python, but it returns only the first time:

re.findall(r'(\d+/\d+/\d+)(\s\d+\:\d+)+','2021/12/23 13:00 14:00 2021/12/24 13:00 14:00 15:00')

>>>[('2021/12/23', ' 14:00'), ('2021/12/24', ' 15:00')]

Appendix: The original problem is in fact a more complicated case, where there are several texts between time, and it is difficult to replace them to '' directly:

2021/12/31 14:00 start 15:00 end 17:00 pending 18:00 ok 2021/12/31 14:00 begin 15:00 end  17:00 start 18:00 suspend

So the robust method here should be this answer.

pattern = regex.compile(r'(?P<date>\d{4}/\d+/\d+)(?:\s?(?P<time>(\d+:\d+))\s+([^\d]+))+')
for m in pattern.finditer('2021/12/31 14:00 start 15:00 end 17:00 pending 18:00 ok 2021/12/30 14:00 begin 15:00 end  17:00 ggh 18:00 suspend'):
    print(m.capturesdict())
>>> {'date': ['2021/12/31'], 'time': ['14:00', '15:00', '17:00', '18:00']}
{'date': ['2021/12/30'], 'time': ['14:00', '15:00', '17:00', '18:00']}

3 Answers 3

2

Use re.findall:

inp = '2021/12/23 13:00 14:00 2021/12/24 13:00 14:00 15:00'
matches = re.findall(r'\d{4}/\d{2}/\d{2}(?: \d{1,2}:\d{2})*', inp)
print(matches)

This prints:

['2021/12/23 13:00 14:00', '2021/12/24 13:00 14:00 15:00']

Explanation of regex:

\d{4}/\d{2}/\d{2}    match a date in YYYY/MM/DD format
(?: \d{1,2}:\d{2})*  match a space followed by hh:mm time, 0 or more times
Sign up to request clarification or add additional context in comments.

Comments

2

You can use this findall + split solution:

import re

s = '2021/12/23 13:00 14:00 2021/12/24 13:00 14:00 15:00'

for i in re.findall(r'\d+/\d+/\d+(?:\s\d+\:\d+)+', s): print (i.split())

Output:

['2021/12/23', '13:00', '14:00']
['2021/12/24', '13:00', '14:00', '15:00']

Code Demo

\d+/\d+/\d+(?:\s\d+\:\d+)+ matches a date string followed by 1 or more time strings.

You. could also use:

print ([i.split() for i in re.findall(r'\d+/\d+/\d+(?:\s\d+\:\d+)+', s)])

To get output:

[['2021/12/23', '13:00', '14:00'], ['2021/12/24', '13:00', '14:00', '15:00']]

2 Comments

Sorry, the case is just an example, it will be much more complex than using split.
Provide details or more examples in question. An answer is based on the question asked.
1

You can use PyPi regex library to get the following to work:

import regex
pattern = regex.compile(r'(?P<date>\d+/\d+/\d+)(?:\s+(?P<time>\d+:\d+))+')
for m in pattern.finditer('2021/12/23 13:00 14:00 2021/12/24 13:00 14:00 15:00'):
    print(m.capturesdict())

Output:

{'date': ['2021/12/23'], 'time': ['13:00', '14:00']}
{'date': ['2021/12/24'], 'time': ['13:00', '14:00', '15:00']}

See the Python demo.

Since PyPi regex library does not "forget" all captures inside a group, and provided the groups are named, the match.capturesdict() returns the dictionary of all groups with their captures.

1 Comment

FYI: one can install PyPi regex module by running pip install regex (or pip3 install regex depending on environment) in the terminal/console window.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.