0

I need to parse the Device Time (i.e. 2012-01-17 13:12:09) in below text by using python. Could you please tell me how I can do this using the standard regular expression library in python? Thanks.

  <html><head><style type="text/css">h1 {color:blue;}h2 {color:red;}</style>
  <h1>Device #1   Root Content</h1><h2>Device Addr: 127.0.0.1:8080</h1>
  <h2>Device Time: 2012-01-17 13:12:09</h2></body></html>
6
  • 3
    See stackoverflow.com/questions/1732348/… Commented Jan 17, 2012 at 12:48
  • this is invalid HTML btw, no closing tag for head. Commented Jan 17, 2012 at 12:50
  • 1
    @Tichodroma I was just about to post that! Commented Jan 17, 2012 at 12:51
  • 1
    why using regex when there are parsers available Commented Jan 17, 2012 at 12:55
  • 1
    I think his context is proper. He is extracting Device time which seems perfect regular. Commented Jan 17, 2012 at 12:57

4 Answers 4

2

Just to add

import re
pattern = re.compile(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})')
first_match = pattern.search(html)
Sign up to request clarification or add additional context in comments.

1 Comment

Although it happens to work here, it's better to use a raw string for regexes. I've edited your answer accordingly. If you get used to this convention, you can avoid a lot of grief later (for example, when your regex contains \b).
1

Maybe like this: import re

str = """ Your HTML String here"""

pattern = re.compile(r"""Device Time:([ \d\-:]*)""")
s = pattern.search(str)

time = s.group(1)

3 Comments

How about parsing the time excluding the date? (e.g. 14:00:51) Thanks.
may be add: day_time = time.strip().split(' ')[1]
The following does everything: pattern = re.compile(r"""Device Time:([ \d\-:]*)""") s = pattern.search(str) time = (s.group(1)).strip() print time pattern = re.compile('(\d{4}-\d{2}-\d{2})') s = pattern.search(time) date_ = s.group(1) print date_ pattern = re.compile('(\d{2}:\d{2}:\d{2})') s = pattern.search(time) hour = s.group(1) print hour
1

Try this regex

Device Time: ([^<]+)

this will just return the remaining rest after the words "Device Time: " till the next html tag starts. As shown in an other answer you could also search for a more specific format of this date time.

In general it's considered bad practice to parse html files with regex. However you're example is more like parsing some normal text which happens to be part of html file... In this case that's kind of fine... ;-)

Comments

1

You need this regex.

/Device Time: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})/

or this,

/Device Time: (\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d)/

Use this regular expression with global switch on.

3 Comments

The content printed is none when I do this. Any suggestion? Thanks.
Probably because of /.../gi delimiters/modifiers which don't work this way in Python.
I am not a python expert so tried to provide standard regex. Fixed it now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.