3

What is the best way of extracting expressions for the following lines using regex:

Sigma 0.10 index = $5.00
beta .05=$25.00
.35 index (or $12.5)
Gamma 0.07

In any of the case, I want to extract the numeric values from each line (for example "0.10" from line 1) and (if available) the dollar amount or "$5.00" for line 1.

3 Answers 3

4
import re
s="""Sigma 0.10 index = $5.00
beta .05=$25.00
.35 index (or $12.5)
Gamma 0.07"""
print re.findall(r'[0-9$.]+', s)

Output:

['0.10', '$5.00', '.05', '$25.00', '.35', '$12.5', '0.07']

More strict regex:

print re.findall(r'[$]?\d+(?:\.\d+)?', s)

Output:

['0.10', '$5.00', '$25.00', '$12.5', '0.07']

If you want to match .05 also:

print re.findall(r'[$]?(?:\d*\.\d+)|\d+', s)

Output:

['0.10', '$5.00', '.05', '$25.00', '.35', '$12.5', '0.07']
Sign up to request clarification or add additional context in comments.

Comments

1

Well the base regex would be: \$?\d+(\.\d+)?, which will get you the numbers. Unfortunately, I know regex in JavaScript/C# so not sure about how to do multiple lines in python. Should be a really simple flag though.

2 Comments

Can you explain why I need the ? at the end.
? means maybe it's there, maybe it isn't. So for example spams? will match 'spam' or 'spams' because the ? comes just after the 's'. However, if we use it with a group (something encased in ( )) then it applies to the entire group. so (\.\d+)? means match if there is a decimal followed by some numbers... or not.
1

Use re.MULTILINE flag and \n to denote line breaks.

source = '''Sigma 0.10 index = $5.00
beta .05=$25.00
.35 index (or $12.5)
Gamma 0.07'''
import re

# only handles two top lines; extend to taste
rx = re.compile(
  'Sigma (\d*\.\d+) index = (\$\d*\.\d+)\nbeta (\d*\.\d+).*', 
   re.MULTILINE
)

print rx.search(source).groups()
# prints ('0.10', '$5.00', '.05')

Consider also .split('\n') on your line and the use of several simpler regexps, one per resulting line.

2 Comments

Splitting the input before parsing seems like a bad idea for both correctness and readability.
@siebz0r: splitting is parsing of newlines that are a part of source format. Seems legit to me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.