1

I want to do the following with python:

  1. Validate if a UTF8 string is an integer.
  2. Validate if a UTF8 string is a float.
  3. Validate if a UTF8 string is of length(1-255).
  4. Validate if a UTF8 string is a valid date.

I'm totally new to python and I believe this should be done with regular expression, except maybe for the last one. Your help is appreciated!

3
  • 1
    Possible dublicate: stackoverflow.com/questions/2103071/… Commented Feb 2, 2010 at 12:27
  • Yea, that one so helpful as well. thanks! Commented Feb 2, 2010 at 15:10
  • if you have and problem and solve it with a regular expression, now you have two problems! Commented Feb 2, 2010 at 15:28

3 Answers 3

6

Regex is not a good solution here.

  1. Validate if a UTF8 string is an integer:

    try:
      int(val)
      is_int = True
    except ValueError:
      is_int = False
    
  2. Validate if a UTF8 string is a float: same as above, but with float().

  3. Validate if a UTF8 string is of length(1-255):

    is_of_appropriate_length = 1 <= len(val) <= 255
    
  4. Validate if a UTF8 string is a valid date: this is not trivial. If you know the right format, you can use time.strptime() like this:

    # Validate that the date is in the YYYY-MM-DD format.
    import time
    try:
      time.strptime(val, '%Y-%m-%d')
      is_in_valid_format= True
    except ValueError:
      is_in_valid_format = False
    

EDIT: Another thing to note. Since you specifically mention UTF-8 strings, it would make sense to decode them into Unicode first. This would be done by:

my_unicode_string = my_utf8_string.decode('utf8')

It is interesting to note that when trying to convert a Unicode string to an integer using int(), for example, you are not limited to the "Western Arabic" numerals used in most of the world. int(u'١٧') and int(u'१७') will correctly decode as 17 even though they are Hindu-Arabic and Devangari numerals respectively.

Sign up to request clarification or add additional context in comments.

Comments

2

Why use regex? I'm convinced it would be slower and more cumbersome.

The int() and float() method or better yet the isdigit() method work well here.

a = "03523"
a.isdigit()
>>> True

b = "963spam"
b.isdigit()
>>> False

For question 3, do you mean "Validate if a UTF8 string is a NUMBER of length(1-255)"?

Why not:

def validnumber(n):
  try:
    if 255 > int(n) > 1:
      return True
  except ValueError:
      return False

Comments

1
  1. int() and check for exceptions
  2. float() - but what do you mean float?
  3. int() and then check using if
  4. using datetime formatting

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.