3

An HTML form returns me a string of a number entered by a user. How do I use regular expressions to see if it is capable of being a number or not. I do not simply want to strip away commas and see if it can be cast to int, nor do I like the locale.atoi method as the strings will evalaute to numbers even if they are nonsense (e.g. locale.atoi('01,0,0') evaluates to 100).

NB this validation only occurs if the string contains commas

The re pattern should be:

1st character is 1-9 (not zero) 2nd and 3rd characters are 0-9 Then 3 digits 1-9 and a comma repeated between 0 and 2 times (999,999,999,999 is largest number possible in the program) Then finally 3 digits 1-9

compiled = re.compile("[1-9][0-9]{0,2},(\d\d\d,){0,2}[0-9]{3}")

which is not matching the end of the string correctly, for example:

re.match(compiled, '123,456,78') 

is matching. What have I done wrong?

3
  • you mentioned three digits at the last. But your string has only 2 digits at the last. Commented Jun 20, 2014 at 9:39
  • yes exactly i.e. it should not match, but it does! Commented Jun 20, 2014 at 9:46
  • Why don't you group all your expression? ([1-9][0-9]{0,2},(\d\d\d,){0,2}[0-9]{3}) Could you post matching examples? Commented Jun 20, 2014 at 9:47

2 Answers 2

1

More Compact

I would suggest something more compact:

^[1-9][0-9]{0,2}(?:,[0-9]{3}){0,3}$

See the demo

  • The ^ asserts that we are at the beginning of the string
  • [1-9] matches our first digit
  • [0-9]{0,2} matches up to two additional digits
  • (?:,[0-9]{3}) matches a comma and three digits...
  • between 0 and three times
  • $ asserts that we are at the end of the string

To validate, you could do:

if re.search("^[1-9][0-9]{0,2}(?:,[0-9]{3}){0,3}$", subject):
    # Successful match
else:
    # Match attempt failed
Sign up to request clarification or add additional context in comments.

9 Comments

@Downvoter, care to explain your actions on this compact, working pattern with an explanation and demo?
very neat, whats the deal with ':?'
I understand now that you have to say when you are at the beginning/end of the string. In my example where I did not do that does it mean that (\d\d\d,){0,2} matched at both, '123,' and '456,', and [0-9]{3} matched at '123', and '456'? Thanks a lot
@WoodyPride the (?: introduces non-capturing parentheses
Please let me know if you have other questions, not sure I understood. :)
|
1

If you want to match the full string, make sure to specify stand and end in your regex, i.e.:

re.compile(r"^[1-9][0-9]{0,2},(\d\d\d,){0,2}[0-9]{3}$")

Also, as you will notice, I used a raw string (r prefix) to avoid escaping \.

Edit

Just to explain what's going on with your regex, the smallest substring it will match is where the first set of digits is matched zero times, and the second set is also matched zero times:, i.e. "[1-9][0-9]{0},(\d\d\d,){0}[0-9]{3}" which is the same as [0-9]{3}. Since this can match anywhere is the string, it could match "123" or "456".

1 Comment

I think I see the point: In my example (\d\d\d,){0,2} matched at both, '123,' and '456,', and [0-9]{3} matched at '123', and '456'. Is that right? I didnt realise you had to specify the beginning and end of the string. The difference is just that the match then happend sequentially along the string - is that right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.