1

What is the performance overhead of different methods of creating and filling lists in Python?

2
  • 2
    It's certainly encouraged to write self-answered questions on SO, but the question should look like a normal question. Commented Jan 15, 2018 at 13:33
  • Fixed. I see your point :-) Commented Jan 15, 2018 at 13:42

1 Answer 1

4

The following is intended to measure the overhead of different methods of creating and filling lists in Python. In a real program you would of course do something more meaningful with the data added to the lists.

To test this, I made a bunch of files called test1.py, test2.py, etc. that create and fill a list in different ways. I then ran them with the timeit module:

python -m timeit "from test1 import foo; foo()"

This was tested with Python 3.6.0 on a laptop PC with a 2.4 GHz CPU.

The results are shown best to worst:

Range (237 msec per loop)

def foo():
    a = list(range(10000000))

A question was asked in the comments about the performance of this. Note that this is only useful for filling a list with sequential numbers.

List comprehension (380 msec per loop)

def foo():
    a = [i for i in range(10000000)]

Pre-allocated list (492 msec per loop)

def foo():
    k = 10000000
    a = [0] * k
    for i in range(k):
        a[i] = i

This pre-allocates the entire list so the for-loop merely fills it without calling list-append. Although the append-function has amortized constant computational time, it is not completely free, because the list has to be grown periodically as it becomes full, which requires allocating new memory and copying the contents of the list. Pre-allocating the list avoids the expense of growing the list.

Generator comprehension (573 msec per loop)

def foo():
    a = list((i for i in range(10000000)))

Generator with yield-function (580 msec per loop)

def foo():
    def bar():
        for i in range(10000000):
            yield i

    a = list(bar())

There is a degree of uncertainty in these time measurements and the two generators seemed to use roughly the same amount of time.

List append (827 msec per loop)

def foo():
    a = []
    for i in range(10000000):
        a.append(i)
Sign up to request clarification or add additional context in comments.

8 Comments

How about just using range(10000000)?
In an actual program you would of course do something more meaningful with the elements of the list. This just tests the overhead of different methods of creating and filling the list. (And if we are nitpicking, you would have to do list(range(10000000)) to actually create a list in your counter-example :-)
What speed do you get for list(range(10000000))? Another option: a = [0] * k; a[:] = range(k)
@AndreyF Sure, that's fine, if you just need an iterator, and not an actual list. Unless you're talking about Python 2 range, in which case you need to migrate to Python 3. :)
Yes, you should include list(range()) in your times, this is actually the idiomatic way of doing this!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.