3

I have run into an oddity with creating nested lists in Python 2.6.6.

Consider the following two functions:

def lists(n):
    start_time = time.time()
    lists = [None]*n
    for i in xrange(n):
            lists[i] = [None]*n
            for j in xrange(n):
                    lists[i][j] = []
    print time.time() - start_time

def simple_lists(n):
    start_time = time.time()
    lists = [None]*n
    for i in xrange(n):
            lists[i] = [None]*n
            for j in xrange(n):
                    lists[i][j] = False
    print time.time() - start_time

They both allocate an array of size n*n. One assigns all values as "False", and one assigns all values as empty lists. They should both run in O(n^2) as far as I can see. However, this does not seem the case... Observe the following test runs:

>>> for i in [4000, 8000, 16000]: simple_lists(i)
2.11170578003
8.67467808723
34.0958559513
>>> for i in [1000, 2000, 4000]: lists(i)
1.13742399216
7.39806008339
78.0808939934

As you can see, simple_lists() seems to grow roughly O(n^2), while lists() seems to grow over O(n^3)!

What's going on here? This quirk is completely and utterly ruining the complexity of my code, and I can't explain why it's behaving like this. Does anyone have any ideas?

Edit: List comprehensions seem to cause the same complexity issues.

Redefining lists() like

def lists(n):
    start_time = time.time()
    lists = [[[] for y in xrange(n)] for x in xrange(n)]
    print time.time() - start_time

causes the following results

>>> for i in [1000, 2000, 4000]: lists(i)
0.388785839081
4.45830011368
65.6449248791

... which is still clearly growing at a pace quicker than O(n^2) (and even quicker than O(n^3) - sheesh).

edit2: After some further looking into the problem, it seems it's caused by the garbage collector. After running gc.disable(), this is the result with the original lists() definition:

>>> for i in [1000, 2000, 4000]: lists(i)
...
0.155457019806
0.616811990738
2.38965821266

which is pretty damn neatly O(n^2).

Moral of the story: don't trust the garbage collector!

3
  • [[E for y in xrange(N)] for x in xrange(N)] Commented Nov 5, 2011 at 23:03
  • @CatPlusPlus: Tried your suggestion (see edit), didn't help with the complexity issue. Commented Nov 5, 2011 at 23:23
  • Oh, I don't know about complexity, but it's certainly a better syntax. If you want efficient multidimensional arrays of fixed size, you should be using NumPy, anyway. Commented Nov 5, 2011 at 23:26

3 Answers 3

2

On my machine

for i in [1000, 2000, 4000]: lists(i)

gives

0.994000196457
4.31200003624
17.9900000095

which is a nice neat O(n^2). The last one consumes 1GB of memory, and so lists(8000) thrashes the hard drive and causes my machine to misbehave. delnan is probably right and I'd check your machine's free memory and python's memory consumption during the operation.

Sign up to request clarification or add additional context in comments.

Comments

2

Turns out the weird behavior was caused by the garbage collector. Turning it off with gc.disable() yielded the following:

>>> for i in [1000, 2000, 4000]: lists(i)
...
0.155457019806
0.616811990738
2.38965821266

which fits O(n^2) like a glove.

Comments

0

Generate the first list first and it will do it in O(n) time.

def simple_list(n):
    import time
    start_time = time.process_time()
    k=[[] for y in range(n)]
    lists = [k for x in range(n)]
    end_time=time.process_time()
    print (end_time - start_time)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.