0

I found this regexp for integers from range -2147483648 to 2147483647.

(0|[1-9]{1}[0-9]{0,8}|[1]{1}[0-9]{1,9}|[-]{1}[2]{1}([0]{1}[0-9]{8}|[1]{1}([0-3]{1}[0-9]{7}|[4]{1}([0-6]{1}[0-9]{6}|[7]{1}([0-3]{1}[0-9]{5}|[4]{1}([0-7]{1}[0-9]{4}|[8]{1}([0-2]{1}[0-9]{3}|[3]{1}([0-5]{1}[0-9]{2}|[6]{1}([0-3]{1}[0-9]{1}|[4]{1}[0-8]{1}))))))))|(\+)?[2]{1}([0]{1}[0-9]{8}|[1]{1}([0-3]{1}[0-9]{7}|[4]{1}([0-6]{1}[0-9]{6}|[7]{1}([0-3]{1}[0-9]{5}|[4]{1}([0-7]{1}[0-9]{4}|[8]{1}([0-2]{1}[0-9]{3}|[3]{1}([0-5]{1}[0-9]{2}|[6]{1}([0-3]{1}[0-9]{1}|[4]{1}[0-7]{1})))))))))

Works for -2147483648 but not working for 2147483647, the last digit is problem no matter what number is.. 214748364 is valid...

8
  • 5
    I'm sure there are easier ways to validate this than a regex. Commented Dec 22, 2013 at 21:06
  • 2
    You could just parse to int ... Commented Dec 22, 2013 at 21:06
  • 6
    If someone writes a regex that uses {1} or [-], that's a clear sign that they don't know what they are doing. Avoid. Commented Dec 22, 2013 at 21:09
  • 1
    @milandjukic88 What language are you using? We’ll give you a better way than regular expressions. Commented Dec 22, 2013 at 21:10
  • 2
    @milandjukic88 Well, just set min="-2147..." and max="2147...". If you need to support older browsers, then parseInt(str, 10) the input value and compare it to your limits. Commented Dec 22, 2013 at 21:11

4 Answers 4

2

***************

Regexp is not made for matching arbitrary range numbers.

***************

This 'only' matches 0 - 2147483647.

First, break into equal length ranges:

0 - 9

10 - 99

100 - 999

1000 - 9999

10000 - 99999

100000 - 999999

1000000 - 9999999

10000000 - 99999999

100000000 - 999999999

1000000000 - 2147483647

Second, break into ranges that yield simple regexes:

0 - 9

10 - 99

100 - 999

1000 - 9999

10000 - 99999

100000 - 999999

1000000 - 9999999

10000000 - 99999999

100000000 - 999999999

1000000000 - 1999999999

2000000000 - 2099999999

2100000000 - 2139999999

2140000000 - 2146999999

2147000000 - 2147399999

2147400000 - 2147479999

2147480000 - 2147482999

2147483000 - 2147483599

2147483600 - 2147483639

2147483640 - 2147483647

Turn each range into a regex:

[0-9]

[1-9][0-9]

[1-9][0-9]{2}

[1-9][0-9]{3}

[1-9][0-9]{4}

[1-9][0-9]{5}

[1-9][0-9]{6}

[1-9][0-9]{7}

[1-9][0-9]{8}

1[0-9]{9}

20[0-9]{8}

21[0-3][0-9]{7}

214[0-6][0-9]{6}

2147[0-3][0-9]{5}

21474[0-7][0-9]{4}

214748[0-2][0-9]{3}

2147483[0-5][0-9]{2}

21474836[0-3][0-9]

214748364[0-7]

Collapse adjacent powers of 10:

[0-9]{1,9}

1[0-9]{9}

20[0-9]{8}

21[0-3][0-9]{7}

214[0-6][0-9]{6}

2147[0-3][0-9]{5}

21474[0-7][0-9]{4}

214748[0-2][0-9]{3}

2147483[0-5][0-9]{2}

21474836[0-3][0-9]

214748364[0-7]

Combining the regexes above yields:

([0-9]{1,9}|1[0-9]{9}|20[0-9]{8}|21[0-3][0-9]{7}|214[0-6][0-9]{6}|2147[0-3][0-9] {5}|21474[0-7][0-9]{4}|214748[0-2][0-9]{3}|2147483[0-5][0-9]{2}|21474836[0-3][0- 9]|214748364[0-7])

Next we'll try factoring out common prefixes using a tree:

Parse into tree based on regex prefixes:

. [0-9]{1,9}

  • 1 [0-9]{9}

  • 2 0 [0-9]{8}

    • 1 [0-3] [0-9]{7}

      • 4 [0-6] [0-9]{6}

      • 7 [0-3] [0-9]{5}

        • 4 [0-7] [0-9]{4}

        • 8 [0-2] [0-9]{3}

          • 3 [0-5] [0-9]{2}

          • 6 [0-3] [0-9]

            • 4 [0-7]

Turning the parse tree into a regex yields:

([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7] [0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))

We choose the shorter one as our result.

\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7] [0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))\b

Sign up to request clarification or add additional context in comments.

Comments

1

I'm answering this question not because I think it's a good idea to do this with a regex but because I may have uncovered a bug in RegexMagic while having it construct this monster for me:

^(?:-?(?:214748364[0-7]|21474836[0-3][0-9]|2147483[0-5][0-9]{2}|214748[0-2][0-9]{3}|21474[0-7][0-9]{4}|2147[0-3][0-9]{5}|214[0-6][0-9]{6}|21[0-3][0-9]{7}|20[0-9]{8}|1[0-9]{9}|[1-9][0-9]{1,8}|[0-9])|-2147483648)$

or, broken down for "legibility":

^
(?:
 -?
 (?:
  214748364[0-7]
 |
  21474836[0-3][0-9]
 |
  2147483[0-5][0-9]{2}
 |
  214748[0-2][0-9]{3}
 |
  21474[0-7][0-9]{4}
 |
  2147[0-3][0-9]{5}
 |
  214[0-6][0-9]{6}
 |
  21[0-3][0-9]{7}
 |
  20[0-9]{8}
 |
  1[0-9]{9}
 |
  [1-9][0-9]{1,8}
 |
  [0-9]
 )
|
 -2147483648
)
$

Comments

1

You can solve this with regex, but it's really not the right thing to do. Something like this will be infinitely more efficient:

function isValid(num)
{
    if (num >= -2147483648 && num <= 2147483647)
        return true;
    else
        return false
}

isValid(2147483646); //true
isValid(-2147483649); //false

Comments

0

Using few groups for readability:

(?:(?:-|\b)(?:1?[0-9]{1,9}|20[0-9]{8}|21[0-3][0-9]{7}|214[0-6][0-9]{6}|2147[0-3][0-9]{5}|21474[0-7][0-9]{4}|214748[0-2][0-9]{3}|2147483[0-5][0-9]{2}|21474836[0-3][0-9]|214748364[0-7])|-2147483648)\b

Using groups like a tree-list to speed up searching:

(?:(?:-|\b)(?:1?[0-9]{1,9}|20[0-9]{8}|21(?:[0-3][0-9]{7}|4(?:[0-6][0-9]{6}|7(?:[0-3][0-9]{5}|4(?:[0-7][0-9]{4}|8(?:[0-2][0-9]{3}|3(?:[0-5][0-9]{2}|6(?:[0-3][0-9]|4[0-7]))))))))|-2147483648)\b

Obviously, it's most efficient to use (?:-|\b)[0-9]{0,10}\b and then attempt to parse the result as an int32, but you may not always have access to a parser (example text editor search/replace).

2 Comments

It's probably just a typo, but your last, "quick and dirty" regex starts with (?:-?|\b), which effectively makes both the minus sign and the word boundary optional. It should be either (?:-|\b) like in the first two regexes, or -?\b.
Right you are. Thanks for taking the time to read so thoroughly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.