Python: how to split and return a list from a function to avoid memory error

Question

I am currently working with a function that enumerates all cycles within a specific array (a digraph) and I need them all. This function returns all cycles as a list of lists (each sublist being a cycle, e.g. result=[[0,1,0],[0,1,2,0]] is a list containing 2 cycles starting and ending in node 0). However, there are millions of cycles so for big digraphs I get a memory error (MemoryError: MemoryError()) since the list of lists containing all cycles is too big.

I would like that the function splits the result in several arrays so I do not get the memory error. Is that possible? and would that solve the issue?

I tried to do that by splitting the results array as a list of sub-results (the sub-results have a maximum size, say 10 million which is below the 500 million max size stated here: How Big can a Python Array Get? ). The idea is that the result is a list containing sub-results: result=[sub-result1, sub-result2]. However, I get a different memory error: no mem for new parser.

The way I do that is as follows:

if SplitResult == False:
    result = [] # list to accumulate the circuits found
    # append cycles to the result list
    if cycle_found(): #cycle_found() just for example
        result.append(new_cycle)
elif SplitResult == True:
    result = [[]] # list of lists to accumulate the circuits found
    # append cycles to the LAST result SUB-lists
    if cycle_found(): #cycle_found() just for example
        result[len(result)-1].append(new_cycle)
    # create a new sublist when the size of the LAST result SUB-lists
    # reaches the size limit (ResultSize)       
    if len(result[len(result)-1]) == ResultSize:
        result.append([])

Maybe the issue is that I merge all sub-results within the results list. In that case, how can I return a variable number of results from a function?

In particular I divide all simple cycles of a 12 node complete digraph in sublists of 10 million cycles. I know there are 115,443,382 cycles in total, so I should get a list with 16 sublists, the first 15 containing 10 million cycles each and the last one containing 443,382 cycles. Instead of that I get a different memory error: no mem for new parser.

This procedure works for an 11 node complete digraph which returns 2 sublists, the first containing the 10 million cycles (10000000) and the other containing 976184. In case it is of any help, their memory footprint is

>>> sys.getsizeof(cycles_list[0])
40764028
>>> sys.getsizeof(cycles_list[1])
4348732

Then, I guess we should add the size of each cycle listed:

>>> sys.getsizeof(cycles_list[0][4])
56
>>> cycles_list[0][4]
[0, 1, 2, 3, 4, 0]

Any help will be most welcome,

Thanks for reading,

Aleix

I don't know right now; I will learn what a generator function is (I am not a programmer) and come back to you, asap. — Aleix
– Aleix, Commented Jun 13, 2013 at 22:02
@Aleix a generator is sort of like a list, but it computes each value as you need it, instead of giving you all at once. This makes it possible to have very large sequences (even infinite ones!) — kqr
– kqr, Commented Jun 13, 2013 at 22:26
Not just a generator, but you should seriously be looking into numpy for this. — U2EF1
– U2EF1, Commented Jun 14, 2013 at 0:59

Community · Accepted Answer · 2017-05-23 12:29:13Z

Thank you for your suggestions. Indeed the right approach to avoid memory issues when returning arrays is simply by avoiding creating so big result arrays. Thus, generator functions are the way forward.

Generator functions are well explained here: What does the "yield" keyword do in Python? I would just add that a normal function becomes a generator function at the very moment where you add a yield in it. Also, if you add a return statement the generation of iterables will end when reaching it (some generator functions do not have "return" and are thus infinite).

Despite the simple use of generators I had some hard time transforming the original function into a generator function since it was a recursive function (i.e. calling itself). However, this entry shows how a recursive generator function looks like Help understanding how this recursive python function works? and so I could apply it to my function.

Again, thanks to all for your support,

Aleix

Collectives™ on Stack Overflow

Python: how to split and return a list from a function to avoid memory error

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related