Extracting time with regex from a string

Question

I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm. So i need to extract the times 10:00 am and 7:00 pm and then convert them to 24 hour format. Then the final string I want to make is like this:

Mon - Fri:,10:00 - 19:00

Any help would be appreciated in this regard. I have tried the following:

import re

txt = 'Mon - Fri:,10:00 am - 7:00 pm'
data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
print(data)

But this regex and any other that I tried to use didn't do the task.

i cannot find the regex to extract the time from this string. If i get the regex, i 'll then move forward. I 'll be obliged if u help in generating regex to extract time from this string — rex sphinx
– rex sphinx, Commented May 19, 2020 at 16:11

ggorlen · Accepted Answer · 2020-05-19 17:19:45Z

3

Your regex enforces a whitespace before the leading digit which prevents ,10:00 am from matching and requires two digits before the colon which fails to match 7:00 pm. r"(?i)(\d?\d:\d\d (?:a|p)m)" seems like the most precise option.

After that, parse the match using datetime.strptime and convert it to military using the "%H:%M" format string. Any invalid times like 10:67 will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).

import re
from datetime import datetime

def to_military_time(x):
    return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M")

txt = "Mon - Fri:,10:00 am - 7:00 pm"
data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt)
print(data) # => Mon - Fri:,10:00 - 19:00

edited May 19, 2020 at 17:19

answered May 19, 2020 at 17:10

ggorlen

59.3k8 gold badges119 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

tmrlvi · Accepted Answer · 2020-05-19 17:02:59Z

1

Your regex looks only for two digit hours (\d{2}) with white space before them (\s). The following captures also one digit hours, with a possible comma instead of the space.

data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

However, you might want to consider all punctuation as valid:

data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?@\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

answered May 19, 2020 at 17:02

tmrlvi

2,36019 silver badges37 bronze badges

Comments

Chih Sean Hsu · Accepted Answer · 2020-05-19 17:06:42Z

1

Regex need to change like here.

import re

text = 'Mon - Fri:,10:00 am - 7:00 pm'
result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text)
print(result.group(1))
# it will print 10:00 am
print(result.group(2))
# it will print 7:00 pm

You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.

You can learn more regex here.

https://regexr.com/

And here you can try regex online.

https://regex101.com/

answered May 19, 2020 at 17:06

Chih Sean Hsu

4332 silver badges11 bronze badges

Comments

Ionut Ticus · Accepted Answer · 2020-05-19 17:12:05Z

1

Why not use the time module?

import time
data = "Mon - Fri:,10:00 am - 7:00 pm"
parts = data.split(",")
days = parts[0]
hours = parts[1]
parts = hours.split("-")
t1 = time.strptime(parts[0].strip(), "%I:%M %p")
t2 = time.strptime(parts[1].strip(), "%I:%M %p")
result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2)

Output:

Mon - Fri:,10:00 - 19:00

answered May 19, 2020 at 17:12

Ionut Ticus

2,7992 gold badges22 silver badges28 bronze badges

Collectives™ on Stack Overflow

Extracting time with regex from a string

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related