0

I'm writing a regex that will parse the string below and STOP exactly at 6.0s. This number, 6.0s, could also be a series of digits like 150 or a decimal like 12.35. "s" can be any letter. The stopping point is the most important

Here's my regex: [\S+\s]+[\d.\d]+[a-z]?

My problem is that my regex keeps passing 6.0s and copying the the dash line all the way to "See"

15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s
------------------------------------------------------------
© Copyright 2012 The Boston Series Group, Inc. All rights reserved. See
8
  • Can you not just use str.partition ? Commented Dec 21, 2012 at 9:11
  • str.partition wouldn't work because I'm dealing with a very long text file. That's just the part I'm having issue with. Commented Dec 21, 2012 at 9:14
  • This is a little unclear. Do you want the string "15+MM ... 6.0s" as the result? Commented Dec 21, 2012 at 9:15
  • Yes, I want everything from 15+MM to 6.0s Commented Dec 21, 2012 at 9:17
  • what settings have you enabled for your regex? Commented Dec 21, 2012 at 9:40

5 Answers 5

1

How about splitting the string across newlines and matching with anything upto a number optionally followed by a decimal, digits and char :-

import re

s = '''15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s
------------------------------------------------------------
 Copyright 2012 The Boston Series Group, Inc. All rights reserved. See'''
m = re.match(r'.+\d+(\.\d+)?[a-z]?', s.split('\n')[0])
print m.group(0)

Output :-

C:\>python st.py
15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s

Or perhaps using the dashes as delimiters by saying just :-

import re

m = re.match(r'(.*?)\s+-----', s)
print m.group(1)
Sign up to request clarification or add additional context in comments.

1 Comment

It has to stop at 6.0s. not 6
0

This will match a line that starts with a series of numbers, characters or +, then anything up to a floating number followed by a s

^[0-9A-Z+]+\s+.*\s+[0-9.]+s$

You should also make sure your regex isn't multi-line.

1 Comment

Following your (@Mike) latest edit, wouldn't the above work for you if you make s an optional character instead ([a-z]?)?
0

Your main problem is that you're using [] to group things — this is a character class (an "any of these characters" construct). Instead you'll want to use ().

But instead, try something like ^\S+\s.+\d+(?:\.\d+)?[a-z]?$ — the ^ and $ are for the start and end of a line, and it sounds like you don't need capture groups at all.

1 Comment

@jtbandes: Remember to make the decimal part optional (?:\.\d+)?.
0

You have not specified anything about the text in front of your 6.0s group, so there is no reasonable way to create reliable regular expression parts for it. The only thing that is clearly specified is the end. Having said that, this example would print all lines that end with something like 6.0s as in your specification:

for line in opened_file:
    mat = re.search("^.*\s(-?\d+(?:\.\d+)?[a-zA-Z])$", line)
    if mat is not None:
        print mat.group(1)

The only assumption is that there is some whitespace in front of it which I guessed from what you have tried.

Comments

0

Does this work for you? I used re.search() because it searches the entire string, not by each line. Read more here.

# -*- coding: utf-8 -*-

import re

s = '''
15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s
------------------------------------------------------------
© Copyright 2012 The Boston Series Group, Inc. All rights reserved. See
'''

m = re.search('.+\d+(?:\.\d+)?[a-zA-Z]{1}', s)
if m != None: print m.group(0)

Output:

15+MM  {NXTW FHR 3153   AB  MABXT YT 197-17 <PA>} | APE 6                   6.0s

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.