sorted() creates an entirely distinct list, so doubles the RAM needed for the massive list. Use a list's .sort() method instead - that sorts the list in-place.
And unless there's something you haven't told us, leave heapq out of it entirely. Putting the entries in a heap serves no purpose I can think of. Just use a list's .append() method to add new entries, and apply .sort(reverse=True) to the list at end.
If you're still running out of RAM, then you simply can't solve this problem entirely in memory, and will need to craft an approach merging disk files.
LIVING WITH "TOO SMALL" RAM
In the worst case, even the list all by itself can't fit in available memory. You can still create the sorted sequence, but it requires writing sorted chunks to disk and merging them later. For the merging part, heapq is useful. Here's an example:
import pickle
import heapq
MAXPERFILE = 100 # the array will never get bigger than this
def getfname(i):
return "pickled%d.dat" % i
filenum = 0
def dumptofile(a): # dump the array to file, as pickled data
global filenum
fname = getfname(filenum)
with open(fname, "wb") as f:
pickle.dump(len(a), f)
for x in a:
pickle.dump(x, f)
filenum += 1
# generate some random data
import random
a = []
for _ in range(1012): # 10 "full" files with some leftovers
a.append(random.random())
if len(a) == MAXPERFILE:
a.sort(reverse=True)
dumptofile(a)
del a[:]
if a:
a.sort(reverse=True)
dumptofile(a)
print("number of files written:", filenum)
# now merge the files together; first a function
# to generate the file contents, one at a time
def feedfile(i):
fname = getfname(i)
with open(fname, "rb") as f:
count = pickle.load(f)
for _ in range(count):
yield pickle.load(f)
for x in heapq.merge(*(feedfile(i) for i in range(filenum)),
reverse=True):
print(x)
Max memory use can be made smaller by decreasing MAXPERFILE, although performance will be better the larger MAXPERFILE. Indeed, if MAXPERFILE is small enough and the total amount of data is large enough, the merging code may die with an OS "too many open files" error.
generators? Instead of storing your data into a list then sorting it, create a generator then sort it once.listin memory to sort it.listup front (without pointless use ofheapq), then call.sort()it's exactly the same. It saves if you're callingsorted(mygenerator)vs.sorted(mylist), but if you already havemylist, callingmylist.sort()avoids the memory overhead too.