0

I'm working with a script that would determine if my string would be a valid variable. It's very basic but I can`t seem to figure out how to use regular expression.

So basically I want:

A-Z
a-z
0-9
no whitespace anywhere
no special char except _

Is that possible ? This is what I tried:

re.match("[a-zA-Z0-9_,/S]*$", char_s):
1
  • You need to anchor your pattern with ^ at the front so the letter (no number) occurs at the beginning... Commented Sep 30, 2013 at 23:17

4 Answers 4

4

A pattern like this should work:

^[a-zA-Z_][a-zA-Z0-9_]*$

Or more simply:

^(?!\d)\w+$

In both cases, it will match a string which consists of one or more letters, digits or underscores as long it doesn't start with a digit.

The (?!…) in the second pattern is a negative look-ahead assertion. It ensures the first character is not a digit. More information can be found in the manual.

Sign up to request clarification or add additional context in comments.

Comments

3

Well on top of the regular expressions mentioned you need to make sure it is not one of the reserved keywords :

and       del       from      not       while    
as        elif      global    or        with     
assert    else      if        pass      yield    
break     except    import    print              
class     exec      in        raise              
continue  finally   is        return             
def       for       lambda    try

So something like this :

reserved = ["and", "del", "from", "not", "while", "as", "elif", "global", "or", "with", "assert", "else", "if", "pass", "yield", "break", "except", "import", "print", "class", "exec", "in", "raise", "continue", "finally", "is", "return", "def", "for", "lambda", "try"]

def is_valid(keyword):
    return (keyword not in reserved and
            re.match(r"^(?!\d)\w+$", keyword) # from p.s.w.g answer

Or like @nofinator suggests you can and should probably just use keyword.iskeyword().

3 Comments

You could also use keyword.iskeyword(). See docs.python.org/2/library/keyword.html#keyword.iskeyword
@nofinator Nice, did not know that.
yeah this is what I used for keyword
1
re.match(r"^[^\W\d]\w*$", char_s):

The word \w character class is equivalent to [a-zA-Z0-9_]. Identifiers cannot start with a digit, so match [^\W\d] for the first character and \w* for the rest of them.

Comments

1

The correct methods:

Python 2

import re
import keyword
import tokenize

re.match(tokenize.Name+"$", char_s) and not keyword.iskeyword(char_s)

Python 3

import keyword

char_s.isidentifier() and not keyword.iskeyword(char_s)

Note that Python 2's method silently fails on Python 3.


When you see these kind of questions the first thing you should ask is "how does Python do it?" because almost all of the time it exposes a method to the user.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.