10

I am trying to understand the internal working of the in command and index() of the list data structure.

When I say:

if something not in some_list :
    print "do something"

Is it traversing the whole list internally, similar to a for loop or does it use, better approaches like hashtables etc.

Also the index() in lists, gives an error if the item is not present in the list. Is the working of both in and index() the same? If index() is better then is it possible to catch the error when an item is not present and if it is possible, is it good programming?

1
  • Fun illustration of O(n): $ for i in 1 2 3 4 5 ; do python -mtimeit -s "l=range(int(1e$i))" "(-1) not in l" ; done Commented Sep 23, 2014 at 0:15

2 Answers 2

12

Good question! Yes, both methods you mention will iterate the list, necessarily. Python does not use hashtables for lists because there is no restriction that the list elements are hashable.

If you know about "Big O" notation, the list data structure is designed for O(1) access by looking up a known index, e.g. my_list[13]. It is O(n) for membership testing.

There are other data structures which are optimised for O(1) speed for membership testing (i.e. __contains__), namely set and dict. These are implemented with hashtables.

Here is an example of how you can use IPython to verify the time-complexity of sets and lists, to confirm these claims:

In [1]: short_list, long_list = range(1000), range(10000)

In [2]: timeit 'potato' not in short_list
10000 loops, best of 3: 40.9 µs per loop

In [3]: timeit 'potato' not in long_list
1000 loops, best of 3: 440 µs per loop

In [4]: small_set, big_set = set(short_list), set(long_list)

In [5]: timeit 'potato' not in small_set
10000000 loops, best of 3: 72.9 ns per loop

In [6]: timeit 'potato' not in big_set
10000000 loops, best of 3: 84.5 ns per loop
Sign up to request clarification or add additional context in comments.

Comments

1

For lists, both methods (in and index()) iterate over the list to check for the item you're looking for, unfortunately. They will stop iteration as soon as the result of the membership test is known, which means they will iterate to the end if the item is not found.

As far as I know, if you must work with lists, the not in construct is the most Python and the one you should go with (but you should dump those unnecessary parentheses).

If you don't need to specifically use a list, the built-in set type can often work in its place. The set is a data structure similar to the list, but it uses a hashing algorithm to test for the presence of an item, so if you're doing a lot of that kind of work, you may consider switching. Read the docs I've linked to though, because sets are unordered, so they don't support things like slicing or indexing.

Yes, you can plan for times when the item you're checking for is not present in your data structure. You're looking for a Try/Except Block:

example_list = [1,2,3]

try:
     index_of_4 = example_list.index(4)
except ValueError:
     print("Oops! 4 wasn't in the list!")

When you know exceptions may occur in your program, you can wrap the offending code in a block like this to gracefully catch and recover from exceptions. It is indeed good programming practice to recover from errors and exceptions as gracefully as you can, even if that means just printing an error message and exiting.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.