Using c-like arrays in python

Question

Is the following ever done in python to minimize the "allocation time" of creating new objects in a for loop in python? Or, is this considered bad practice / there is a better alternative?

for row in rows:
    data_saved_for_row = [] // re-initializes every time (takes a while)
    for item in row:
        do_something()
    do_something

vs. the "c-version" --

data_saved_for_row = []
for row in rows:
    for index, item in enumerate(row):
        do_something()
    data_saved_for_row[index + 1] = '\0' # now we have a crude way of knowing
    do_something_with_row()              # when it ends without having 
                                         # to always reinitialize

Normally the second approach seems like a terrible idea, but I've run into situations when iterating million+ items where the initialization time of the row:

data_saved_for_row = []

has taken a second or more to do.

Here's an example:

>>> print timeit.timeit(stmt="l = list();", number=int(1e8))
7.77035903931

Would end on the “end” of the list, no? Creating the list of a given size (with None values) should not be a bottleneck; if other objects are created the question changes.. in either case the ‘problem’ could be represented more clearly. — user2864740
– user2864740, Commented Sep 7, 2019 at 20:19
I think in this kind of situation context is everything. Sometimes if you need to do something with data_saved_for_row after the iteration allocating something new is a good idea. In other cases, such as a dynamic programming problem, you can often overwrite what you've already computed. — Primusa
– Primusa, Commented Sep 7, 2019 at 20:21
A second out of what? Are you saying that list initialization dominates execution time? Cause its hard for me to believe that (unless your code does nothing except for list initializations). — freakish
– freakish, Commented Sep 7, 2019 at 20:24
Also what's the point of timing 100mln list initializations? — freakish
– freakish, Commented Sep 7, 2019 at 20:31

Green Cloak Guy · Accepted Answer · 2019-09-07 20:36:16Z

If you want functionality for this sort of performance, you may as well just write it in C yourself and import it with ctypes or something. But then, if you're writing this kind of performance-driven application, why are you using Python to do it in the first place?

You can use list.clear() as a middle-ground here, not having to reallocate anything immediately:

data_saved_for_row = []
for row in rows:
    data_saved_for_row.clear()
    for item in row:
        do_something()
    do_something

but this isn't a perfect solution, as shown by the cPython source for this (comments omitted):

static int
_list_clear(PyListObject *a)
{
    Py_ssize_t i;
    PyObject **item = a->ob_item;
    if (item != NULL) {
        i = Py_SIZE(a);
        Py_SIZE(a) = 0;
        a->ob_item = NULL;
        a->allocated = 0;
        while (--i >= 0) {
            Py_XDECREF(item[i]);
        }
        PyMem_FREE(item);
    }

    return 0;
}

I'm not perfectly fluent in C, but this code looks like it's freeing the memory stored by the list, so that memory will have to be reallocated every time you add something to that list anyway. This strongly implies that the python language just doesn't natively support your approach.

Or you could write your own python data structure (as a subclass of list, maybe) that implements this paradigm (never actually clearing its own list, but maintaining a continuous notion of its own length), which might be a cleaner solution to your use case than implementing it in C.

Collectives™ on Stack Overflow

Using c-like arrays in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related