I create a long list of python objects, with many identical entries. In order to save some memory, I want to ensure that identical elements share their memory. For example:
In [1]: alist = [(3, 4), (3, 4)]
In [2]: alist[0] == alist[1]
Out[2]: True
In [3]: alist[0] is alist[1]
Out[3]: False
The two list elements are equal, but use separate memory. In this simple example the problem can be fixed by
In [4]: alist[1] = alist[0]
In [5]: alist[1] is alist[0]
Out[5]: True
How can this be done in a more general way? The list is not sorted, so identical elements are usually not next to each other. One solution I came up with is this:
g_dupdict = dict()
def dedup(x):
try:
return g_dupdict[x]
except KeyError:
g_dupdict[x] = x
return x
for k in range(len(alist)):
alist[k] = dedup(alist[k])
This works, but introduces new problems. It seems silly to use a dict, a set should do, but I don't know how to make this work. And then the dict holds an additional reference to each object, so the memory doesn't get freed when the elements are deleted from the list. To fix this, I delete the dict occasionally, but as a result it must be re-created whenever new elements get added to the list. Is there is better solution to this problem? Thanks.
dedupfunction more efficient by usingg_dupdict.setdefault(x, x).