Python: Data validation using regular expression

Question

I am trying to use Python regular expression to validate the value of a variable.

The validation rules are as follows:

The value can contain any of a-z, A-Z, 0-9 and * (no blank, no -, no ,)
The value can start with a number (0-9) or alphabet (a-z, A-Z) or *
The value can end with a number (0-9) or alphabet (a-z, A-Z) or *
In the middle, the value can contain a number (0-9) or alphabet (a-z, A-Z) or *
Any other values must not be allowed

Currently I am using the following snippet of code to do the validation:

import re
data = "asdsaq2323-asds"
if re.compile("[a-zA-Z0-9*]+").match(data).group() == data:
    print "match"
else:
    print "no match"

I feel there should be a better way of doing the above. I am looking for something like the following:

validate_func(pattern, data) 
/* returns data if the data passes the validation rules */
/* return None if the data does not passes the validation rules */
/* should not return part of the data which matches the validation rules */

Does one such build-in function exist?

[\w\d\*]+ does not suffice? Best way to test is to do a mass-loop to test all your values you already have (in a list/file?). When a match in the loop fails you should print it out and do more research on it what goes wrong and see how you can optimize your RegEx. — user1467267
– user1467267, Commented Mar 22, 2013 at 22:38

ruakh · Accepted Answer · 2013-03-22 22:43:20Z

6

In a regex, the metacharacters ^ and $ mean "start-of-string" and "end-of-string" (respectively); so, rather than seeing what matches, and comparing it to the whole string, you can simply require that the regex match the whole string to begin with:

import re
data = "asdsaq2323-asds"
if re.compile("^[a-zA-Z0-9*]+$").match(data):
    print "match"
else:
    print "no match"

In addition, since you're only using the regex once — you compile it and immediately use it — you can use the convenience method re.match to handle that as a single step:

import re
data = "asdsaq2323-asds"
if re.match("^[a-zA-Z0-9*]+$", data):
    print "match"
else:
    print "no match"

answered Mar 22, 2013 at 22:43

ruakh

185k29 gold badges292 silver badges324 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrew Clark · Accepted Answer · 2013-03-22 22:43:27Z

3

To make sure the entire string matches your pattern, use beginning and end of string anchors in your regex. For example:

regex = re.compile(r'\A[a-zA-Z0-9*]+\Z')
if regex.match(data):
    print "match"
else:
    print "no match"

Making this a function:

def validate_func(regex, data):
    return data if regex.match(data) else None

Example:

>>> regex = re.compile(r'\A[a-zA-Z0-9*]+\Z')
>>> validate_func(regex, 'asdsaq2323-asds')
>>> validate_func(regex, 'asdsaq2323asds')
'asdsaq2323asds'

As a side note, I prefer \A and \Z over ^ and $ for validation like this the meaning of ^ and $ can change depending on the flags used, and $ will match just before a line break characters at the end of the string.

answered Mar 22, 2013 at 22:43

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

Comments

AlwaysBTryin · Accepted Answer · 2013-03-22 22:41:14Z

2

I think you're looking for

re.match('^[a-zA-Z0-9*]+$',data) and data

The extra and data is just to return data, but I'm not sure why you need that. Checking the re.match result against None is enough to check whether the string is valid.

answered Mar 22, 2013 at 22:41

AlwaysBTryin

1,97412 silver badges7 bronze badges

Collectives™ on Stack Overflow

Python: Data validation using regular expression

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related