so I'm trying to parse a large list of strings with comma-separated values into (a list of) lists. I'm trying to do this "in place", such as not to duplicate an already large object in memory.
Now, ideally, during and after parsing, the only additional memory required would be the overhead of representing the original strings as lists of strings. But what actually happens is much, much worse.
E.g. this list of strings occupies ~1.36 GB of memory:
import psutil
l = [f"[23873498uh3149ubn34, 59ubn23459un3459, un3459-un345, 9u3n45iu9n345, {i}]" for i in range(10_000_000)]
psutil.Process().memory_info().rss / 1024**3
>> 1.3626747131347656
The desired end result of the parsing would take up somewhat more (~1.8 GB):
import psutil
l = [["23873498uh3149ubn34", "59ubn23459un3459", "un3459-un345", "9u3n45iu9n345", str(i)] for i in range(10_000_000)]
psutil.Process().memory_info().rss / 1024**3
1.7964096069335938
However, actually doing the parsing of the original strings requires a whopping 5 GB of memory, i.e. much more even than the initial strings plus the final lists together:
import psutil
l = [f"[23873498uh3149ubn34, 59ubn23459un3459, un3459-un345, 9u3n45iu9n345, {i}]" for i in range(10_000_000)]
for i, val in enumerate(l):
l[i] = val.split(", ")
psutil.Process().memory_info().rss / 1024**3
4.988628387451172
Now, I understand that pure Python strings and lists themselves are not terribly efficient (memory-wise), but I fail to understand that huge, huge gap between the memory required for the final result (1.8GB), and what's used in the process to get there (5GB).
Can anyone explain what exactly is going on, whether it is possible to actually modify lists "in place" (while actually freeing the memory of replaced values), and whether there is a better in-memory way to do this?
numbers = list(range(10_000_000)) print(psutil.Process().memory_info().rss / 1024 ** 3)returned ' '1.6968574523925781' contributed by using enumerate?