2

Could anyone help me with the following scenario?

NFL season is approaching and I am working on a python script to scrape spreads off a website for analysis.

scenario one: spread comes in the form -3+3

scenario two: spread comes in the form -3.5+3.5

import re 

s1 = '-3+3'
s2 = '-3.5+3.5'

search1 = re.search(r'(.\d)(.*)',s1)
search2 = re.search(r'(.\d)(.*)',s2)

print search1.group(1),','search1.group(2)
print search2.group(1),',',search2.group(2)

>-3 , +3
>-3 , .5+3.5

As you can see the output of the second scenario chops off anything after the decimal place and places it in front of the next number. Can anyone help me find a solution that would be applicable to both situations?

Thanks!

3
  • 3
    ([-+]?\d+(?:\.\d+)?)([-+]?\d+(?:\.\d+)?) try that. Commented Aug 25, 2016 at 15:18
  • Works perfectly!! thanks Commented Aug 25, 2016 at 15:28
  • Awesome. I put it into an answer. Please vote as accepted if it solved your problem Commented Aug 25, 2016 at 15:54

3 Answers 3

3

You can use re.findall() with '(.\d(?:\.\d+)?)' as your regex, which uses an optional group for matching the decimal part:

>>> re.findall(r'(.\d+(?:\.\d+)?)', s1)
['-3', '+3']
>>> re.findall(r'(.\d+(?:\.\d+)?)', s2)
['-3.5', '+3.5']
Sign up to request clarification or add additional context in comments.

2 Comments

Thank You! that seems to work for all single digit numbers but not with numbers greater than 10
@AlexRosa In that case you just need a + after \d.
2

As I said in the comments, this regular expression will grab any pairs of numbers, optionally preceded by +/-, with a decimal or not.

([-+]?\d+(?:\.\d+)?)([-+]?\d+(?:\.\d+)?)

Also, if you are going to be using the same regular expression more than once (and especially if you will be using it dozens or more times), you should compile it before use:

import re
pattern = re.compile(r'([-+]?\d+(?:\.\d+)?)([-+]?\d+(?:\.\d+)?)')
s1 = '-3+3'
s2 = '-3.5+3.5'
search1 = pattern.search(s1)
search2 = pattern.search(s2)
print search1.group(1), "," , search1.group(2)

This will increase performance potentially dozens of times over matching a raw string pattern.

Comments

0

Here's an example that works with multiple digit numbers:

import re

NUMBERS_RE = '[\-\+]\d*\.?\d+'

s1 = '-3+3'
s2 = '-3.5+3.5-12.56+300.9998-.2+5'

print re.findall(NUMBERS_RE, s1)
print re.findall(NUMBERS_RE, s2)

This outputs:

['-3', '+3']
['-3.5', '+3.5', '-12.56', '+300.9998', '-.2', '+5']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.