2

I need to check if a string is hexadecimal. I learnt 2 approaches -

1.) Looping over each character

all(c in string.hexdigits for c in s) # Straight forward with no optimizations

2.) Use int() function to check for an error

try:
    int(s, 16)
    return True
except ValueError:
    return False

In the 1st case I know the complexity is O(n). But what about the 2nd one? What is the time complexity there?

4
  • How large is your data and how often do you expect non-hex digits? Throwing an exception takes some amount of time as well so if you expect to throw a lot of them it won't be very efficient. Commented Oct 17, 2019 at 16:50
  • Strings are not really big. At max 60 characters. Checks are required only once in a flow. But the flow needs to be real swift. This in turn demands us to optimize everything possible. Commented Oct 17, 2019 at 16:53
  • 1
    In terms of complexity it should be the same O(n), where n is the numbers of characters in s. Note that is probably faster to do set(s) < set(string.hexdigits), what I mean is faster to check a set than a string for containment. Commented Oct 17, 2019 at 16:54
  • as for me int() whould need one for-loop to convert it - so it is O(n). But you could also use time to check real time for few example and then see which one can be faster. Probably both will be very similar. Commented Oct 17, 2019 at 16:56

2 Answers 2

5

int(s, 16) will still have complexity O(n), where n == len(s), but the two aren't directly comparable. int will be iterating over the data at a lower level than all, which is faster, but int also does more work (it actually has to compute the integer value of s).

So which is faster? You have to profile both.

In [1]: s = "783c"

In [2]: import string

In [3]: %timeit all(c in string.hexdigits for c in s)
800 ns ± 3.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %%timeit
   ...: try:
   ...:   int(s, 16)
   ...: except ValueError:
   ...:   pass
   ...:
223 ns ± 1.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Looks like internal iteration wins. I tested on an 9-digit string as well, and int was still about 4x faster.

But what about invalid strings?

In [8]: s = 'g'

In [9]: %%timeit
   ...: try:
   ...:   int(s, 16)
   ...: except ValueError:
   ...:   pass
   ...:
1.09 µs ± 2.62 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [10]: %timeit all(c in string.hexdigits for c in s)
580 ns ± 6.55 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Now, we're basically testing the benefit of short-circuiting versus the cost of catching an exception. What happens if the error occurs later in the string?

In [11]: s = "738ab89ffg"

In [12]: %timeit all(c in string.hexdigits for c in s)
1.59 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [13]: %%timeit
    ...: try:
    ...:   int(s, 16)
    ...: except ValueError:
    ...:   pass
    ...:
1.25 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Now we're seeing the benefit to internal iteration again.

Sign up to request clarification or add additional context in comments.

4 Comments

Out of interest, can you add a version that uses str.translate() to the profile (remove_hex = str.maketrans({c: None for c in string.hexdigits}) and s.translate(remove_hex) == '' as the check)?
Awesome. I could have tried this. Thanks. Can we get to see the implementation of this function somewhere?
...or not, I just did it myself. str.translate() is orders of magnitude slower. Interesting.
@Tomalak It's slower than int, but much closer to it than all(...). However, it's much more verbose than int for no benefit.
2

To check if a string is representing a valid hex number or not means, that you need a bool value. Here you may also use python sets using the issubset() function (Python 2.4 and higher). A run with 10,000 cyles delievers same execution times. Also with longer strings and non hex values. I prefer using sets as you directly get a bool value:

import time

cycles = 100000

hex_set = set('abcdefABCDEF0123456789')
s = 'FFbcdf7340958390230203400348dfsfAF992394787834DE'

start_time = time.time()
for i in range(cycles):
    s_set = set(s)
    ishex = s_set.issubset(hex_set)

end_time = time.time()
mean_time = (end_time - start_time)/cycles
print('Execution time:', mean_time, 'seconds')

start_time = time.time()
for i in range(cycles):
    try:
        ishex = bool(int(s, 16))
    except ValueError:
        ishex = False
        
end_time = time.time()
mean_time = (end_time - start_time)/cycles
print('Execution time:', mean_time, 'seconds')

Execution time: 2.4258804321289062e-06 seconds
Execution time: 2.663140296936035e-06 seconds

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.