Need Help: Python Regex

Question

I'm writing a regex that will parse the string below and STOP exactly at 6.0s. This number, 6.0s, could also be a series of digits like 150 or a decimal like 12.35. "s" can be any letter. The stopping point is the most important

Here's my regex: [\S+\s]+[\d.\d]+[a-z]?

My problem is that my regex keeps passing 6.0s and copying the the dash line all the way to "See"

15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s
------------------------------------------------------------
© Copyright 2012 The Boston Series Group, Inc. All rights reserved. See

str.partition wouldn't work because I'm dealing with a very long text file. That's just the part I'm having issue with. — Mike
– Mike, Commented Dec 21, 2012 at 9:14
This is a little unclear. Do you want the string "15+MM ... 6.0s" as the result? — jtbandes
– jtbandes, Commented Dec 21, 2012 at 9:15

Himanshu · Accepted Answer · 2012-12-21 09:46:57Z

1

How about splitting the string across newlines and matching with anything upto a number optionally followed by a decimal, digits and char :-

import re

s = '''15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s
------------------------------------------------------------
 Copyright 2012 The Boston Series Group, Inc. All rights reserved. See'''
m = re.match(r'.+\d+(\.\d+)?[a-z]?', s.split('\n')[0])
print m.group(0)

Output :-

C:\>python st.py
15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s

Or perhaps using the dashes as delimiters by saying just :-

import re

m = re.match(r'(.*?)\s+-----', s)
print m.group(1)

edited Dec 21, 2012 at 9:46

answered Dec 21, 2012 at 9:28

Himanshu

2,4743 gold badges26 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mike Over a year ago

It has to stop at 6.0s. not 6

snurre · Accepted Answer · 2012-12-21 09:24:48Z

0

This will match a line that starts with a series of numbers, characters or +, then anything up to a floating number followed by a s

^[0-9A-Z+]+\s+.*\s+[0-9.]+s$

You should also make sure your regex isn't multi-line.

answered Dec 21, 2012 at 9:24

snurre

3,1152 gold badges26 silver badges31 bronze badges

1 Comment

Johny Skovdal Over a year ago

Following your (@Mike) latest edit, wouldn't the above work for you if you make s an optional character instead ([a-z]?)?

jtbandes · Accepted Answer · 2012-12-21 09:37:47Z

0

Your main problem is that you're using [] to group things — this is a character class (an "any of these characters" construct). Instead you'll want to use ().

But instead, try something like ^\S+\s.+\d+(?:\.\d+)?[a-z]?$ — the ^ and $ are for the start and end of a line, and it sounds like you don't need capture groups at all.

edited Dec 21, 2012 at 9:37

answered Dec 21, 2012 at 9:18

jtbandes

119k38 gold badges244 silver badges282 bronze badges

1 Comment

Johny Skovdal Over a year ago

@jtbandes: Remember to make the decimal part optional (?:\.\d+)?.

hochl · Accepted Answer · 2012-12-21 09:43:53Z

0

You have not specified anything about the text in front of your 6.0s group, so there is no reasonable way to create reliable regular expression parts for it. The only thing that is clearly specified is the end. Having said that, this example would print all lines that end with something like 6.0s as in your specification:

for line in opened_file:
    mat = re.search("^.*\s(-?\d+(?:\.\d+)?[a-zA-Z])$", line)
    if mat is not None:
        print mat.group(1)

The only assumption is that there is some whitespace in front of it which I guessed from what you have tried.

edited Dec 21, 2012 at 9:43

answered Dec 21, 2012 at 9:28

hochl

13.1k10 gold badges58 silver badges92 bronze badges

Comments

jackcogdill · Accepted Answer · 2012-12-24 16:17:00Z

0

Does this work for you? I used re.search() because it searches the entire string, not by each line. Read more here.

# -*- coding: utf-8 -*-

import re

s = '''
15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s
------------------------------------------------------------
© Copyright 2012 The Boston Series Group, Inc. All rights reserved. See
'''

m = re.search('.+\d+(?:\.\d+)?[a-zA-Z]{1}', s)
if m != None: print m.group(0)

Output:

15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s

edited Dec 24, 2012 at 16:17

answered Dec 24, 2012 at 16:09

jackcogdill

5,1623 gold badges32 silver badges48 bronze badges

Collectives™ on Stack Overflow

Need Help: Python Regex

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related