2

I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm. So i need to extract the times 10:00 am and 7:00 pm and then convert them to 24 hour format. Then the final string I want to make is like this:

Mon - Fri:,10:00 - 19:00

Any help would be appreciated in this regard. I have tried the following:

import re

txt = 'Mon - Fri:,10:00 am - 7:00 pm'
data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
print(data)

But this regex and any other that I tried to use didn't do the task.

2
  • i cannot find the regex to extract the time from this string. If i get the regex, i 'll then move forward. I 'll be obliged if u help in generating regex to extract time from this string Commented May 19, 2020 at 16:11
  • 1
    i have posted an attemp Commented May 19, 2020 at 16:18

4 Answers 4

3

Your regex enforces a whitespace before the leading digit which prevents ,10:00 am from matching and requires two digits before the colon which fails to match 7:00 pm. r"(?i)(\d?\d:\d\d (?:a|p)m)" seems like the most precise option.

After that, parse the match using datetime.strptime and convert it to military using the "%H:%M" format string. Any invalid times like 10:67 will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).

import re
from datetime import datetime

def to_military_time(x):
    return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M")

txt = "Mon - Fri:,10:00 am - 7:00 pm"
data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt)
print(data) # => Mon - Fri:,10:00 - 19:00
Sign up to request clarification or add additional context in comments.

Comments

1

Your regex looks only for two digit hours (\d{2}) with white space before them (\s). The following captures also one digit hours, with a possible comma instead of the space.

data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

However, you might want to consider all punctuation as valid:

data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?@\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

Comments

1

Regex need to change like here.

import re

text = 'Mon - Fri:,10:00 am - 7:00 pm'
result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text)
print(result.group(1))
# it will print 10:00 am
print(result.group(2))
# it will print 7:00 pm

You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.

You can learn more regex here.

https://regexr.com/

And here you can try regex online.

https://regex101.com/

Comments

1

Why not use the time module?

import time
data = "Mon - Fri:,10:00 am - 7:00 pm"
parts = data.split(",")
days = parts[0]
hours = parts[1]
parts = hours.split("-")
t1 = time.strptime(parts[0].strip(), "%I:%M %p")
t2 = time.strptime(parts[1].strip(), "%I:%M %p")
result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2)

Output:

Mon - Fri:,10:00 - 19:00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.