3

I need to find numeric range in the format "number-number". the number should be in the range 0-3000. so I came up with this regular expression

match = re.search(r'^[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]-[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]',sentence)

When I ran the program I wanted to extract only 56-900 in the sentence but the program extracted other numbers also like 2016, CLP2012 etc.. I wanted to extract only numbers that have "-" in between them. What is wrong in my pattern.

3
  • You should use "[0-9]+-[0-9]+". Commented Oct 9, 2017 at 8:52
  • The | operator has least priority, so even e.g. [1-9][0-9] is an accepted pattern: you should parenthesize the two parts before and after the hyphen. Also, a more compact formulation should exist. Commented Oct 9, 2017 at 8:54
  • Why not "[123]\d{0,3}" (or "[0-3]\d{0,3}" to include 0's)? Commented Oct 9, 2017 at 8:55

4 Answers 4

5

Use the python package regex_engine for generate regular expressions for numerical ranges

You can install this package by using pip

pip install regex-engine

from regex_engine import generator

generate = generator()

regex = generate.numerical_range(0,3000)

print(regex)

^([0-9]|[2-8][0-9]|1[0-9]|9[0-9]|[2-8][0-9][0-9]|1[1-9][0-9]|10[0-9]|9[0-8][0-9]|99[0-9]|[2-2][0-9][0-9][0-9]|1[1-9][0-9][0-9]|10[1-9][0-9]|100[0-9]|300[0-0])$

You can also generate regexes for floating point and negative ranges

from regex_engine import generator

generate = generator()

regex1 = generate.numerical_range(5,89)
regex2 = generate.numerical_range(81.78,250.23)
regex3 = generate.numerical_range(-65,12)
Sign up to request clarification or add additional context in comments.

2 Comments

Weird package. It worked really well, but for whatever reason it adds ^ at the beginning and $, so that python's re.search fails on substrings...
You can remove those using the replace() method.
2

If you want to match ranges of integers, you need to protect the matches with r"\b" (begin/end of string):

>>> import re

>>> text = "2016, CLP2012 56-900 3000-3000 4000-4000 123-123 0-0"
>>> re.findall(r"\b\d+-\d+\b", text)
['56-900', '3000-3000', '4000-4000', '123-123', '0-0']

If you want to match only integers from 0 to 3000, you need a more precise RegEx, like this:

>>> r = r"(?:3000|[1-2]\d{3}|[1-9]\d{2}|[1-9]\d|\d)"
>>> re.findall(r"\b" + r + "-" + r + r"\b", text)
['56-900', '3000-3000', '123-123', '0-0']

2 Comments

I think '^' for begin and '$' for end is better.
@scriptboy: No, ^(or $) mean "begin" (or "end") of string. Not "begin" (or "end") or word (which is matched by \w+)
0

This code extract just a true range x-y and x< y <= 3000

sentence = 'test 69 example 55-66 example 77-44 example 999-3001 example'

for word in re.findall('\d+-\d+', sentence):
    l = word.split('-')
    if int(l[0])< int(l[1]) <= 3000:
        word

Output for this example :

'55-66'

Comments

0

For a modern solution, head to my fork of regex-engine, range-ex

pip install range-ex

Pass a minimum and maximum value to the range_regex function to generate a regex that matches numbers in that range. The range is inclusive, meaning both the minimum and maximum values are included in the regex.

from range_ex import range_regex

regex1 = range_regex(5,89)
# ([5-9]|[2-7][0-9]|1[0-9]|8[0-9])

regex2 = range_regex(-65,12)
# (-[1-9]|-[2-5][0-9]|-1[0-9]|-6[0-5]|[0-9]|1[0-2])

Note: This will still find matches in strings like 1234 or abc25def53, so you may want to wrap it in ^ and $ to match the whole string or \b...\b to ensure word boundaries are matched.

If you only pass one of the two arguments, the other will be set to None, which means it will not be constrained. In this case, the regex will match any number that is greater than or equal to the minimum or less than or equal to the maximum.

regex3 = range_regex(minimum=5)
# (([5-9])|[1-9]\\d{1}\\d*)

regex4 = range_regex(maximum=89)
# (-[1-9]\\d*|([0-9]|[2-7][0-9]|1[0-9]|8[0-9]))

This package resolves a bug in regex-engine for negative ranges including 0 and adds the option for unbounded ranges (lower/upper bound only).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.