-1

As far as I know, in Python, there is no array data structure; the most intuitive solution that most of people often use is the Python list.

The question is, whether the Python list is implemented well enough to enable O(1) access to the 10,000th item in the list?

What is better to do for performance? Shall I use dict or list?

13
  • 3
    List access by index and dictionary access by key are both O(1) in the average case - see wiki.python.org/moin/TimeComplexity. If you will have values for all 10,000 indices, a list makes more sense; if it will be sparse, the dictionary may use less space. Commented Oct 8, 2014 at 10:40
  • 1
    Don't let the name mislead you. Python's list class is a dynamic array, not a linked list. Commented Oct 8, 2014 at 10:44
  • 1
    @TimPietzcker the worst case is every key's hash colliding Commented Oct 8, 2014 at 10:46
  • 2
    @TimPietzcker if you made every key be an object which returns the same hash (easy to do by defining __hash__ on your own class) you would get an O(n) dict :) Commented Oct 8, 2014 at 10:47
  • 1
    No its not easy with integers because you can only have unique keys Commented Oct 8, 2014 at 10:48

3 Answers 3

3

list is a bit of a misnomer: internally, Python's list is an array and not a linked list.

Being an array, list offers constant-time access by index.

Sign up to request clarification or add additional context in comments.

2 Comments

IMHO, Python's array and bytearray types are arrays (which exhibit the characteristic that the values are stored in order; Python lists store references), and list is a more generic term than linked list. For instance the wikipedia articles on the subject agree.
@YannVernier: I think we're splitting hairs. I see it all the time that when someone sees the word "list", they assume a linked list (this question is one such example). I personally think it's not an unreasonable assumption, and is certainly a common one.
1

I made a little benchmark and got into a conclusion that List works faster.

Here is the benchmark I've made:

Running with list: 198sec user time:

> time python /tmp/python_benchmark.py list
finished_building
198.816u 0.036s 3:18.97 99.9%   0+0k 0+0io 0pf+0w

Running with dictionary: 211sec user time:

> time python /tmp/python_benchmark.py dict
finished_building
211.065u 0.044s 3:31.19 99.9%   0+0k 0+8io 0pf+0w

I've printed the finished building string, to know how much approx. seconds it took to build the data structure, before starting to run the queries. In both times, it took less that 1 second.

This is the content of the benchmark script:

import sys
from random import randrange

NUM_OF_ITEMS = 100000
NUM_OF_ACCESSES = 200000000
data_structure = None

def exit_usage():
    print "Usage: python_benchmark.py <list | dict>"
    sys.exit(1)

def access_data_structure():
    for i in xrange(NUM_OF_ACCESSES):
        rnd_index = randrange(NUM_OF_ITEMS)
        dummy_assignment = data_structure[rnd_index]

if len(sys.argv) != 2:
    exit_usage()

work_type = sys.argv[1]
if work_type != 'list' and work_type != 'dict':
    exit_usage()

if work_type == 'list':
    data_structure = []
else:
    data_structure = {}

if work_type == 'list':
    for i in xrange(NUM_OF_ITEMS):
        data_structure.append(i)
else:
    for i in xrange(NUM_OF_ITEMS):
        data_structure[i] = i

print 'finished_building'

access_data_structure()

Comments

1

There are actually several array data types in Python; list is an array of object references, as is tuple, while bytearray and array.array are numeric arrays with specific types and range checks (more generic ones, including multidimensional, are added by NumPy), and aside from tuple these are all dynamic.

dict is a hash table, which is a more complex data type useful primarily when keys are disorganized but hashable and the mapping is queried many times. Its keys are unordered and may be non-numeric.

What the standard library doesn't have, however, is a linked list type. The collections.deque type exhibits the fast end access of a doubly linked list, though insertions in the middle require memory copies (but not linear scans). There are of course addon modules with linked lists also, such as llist.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.