1

I have a function gen() which returns a numpy array of nElements number of floats. I'm looking for a more Pythonic (one liner?) way to do the following:

a = zeros((nSamples, nElements))
for i in xrange(nSamples):
     a[i,:] = gen()

This is one way to do it:

a = array([gen() for i in xrange(nSamples)]).reshape((nSamples, nElements))

But it understandably is a bit slower on account of not pre-allocating the numpy array:

import time
from numpy import *

nSamples  = 100000
nElements = 100

start = time.time()
a = array([gen() for i in xrange(nSamples)]).reshape((nSamples, nElements))
print (time.time() - start)

start = time.time()
a = zeros((numSamples, nElements))
for i in xrange(numSamples):
    a[i,:] = gen()
print (time.time() - start)

Output:

1.82166719437
0.502261161804

Is there a way to achieve the same one-liner while keeping the preallocated array for speed?

1
  • I'm no great guru of pythonicity, but I would use empty() rather than zeros() to save time, avoiding one useless pass over the entire array. Commented May 9, 2011 at 14:58

2 Answers 2

8

This may not answer your question directly, but since you mentioned Pythonic in the title... Please understand that Pythonic isn't necessarily a "one-liner" or the most clever and short (keystroke-wise) way of doing something. Quite the contrary - Pythonic code strives for clarity.

In the case of your code, I find:

a = zeros((nSamples, nElements))
for i in xrange(nSamples):
     a[i,:] = gen()

Much clearer than:

a = array([gen() for i in xrange(nSamples)]).reshape((nSamples, nElements))

Hence I wouldn't say the second one is more Pythonic. Probably less so.

Sign up to request clarification or add additional context in comments.

Comments

1

i believe this will do what you want:

a = vstack([ gen() for _ in xrange(nSamples) ])

as i don't have access to your gen function, i can't do timing tests. also, this (as well as your one-liner) are not as memory-friendly as your for loop version. the one-liners store all gen() outputs and then construct the array, whereas the for loop only needs to have one gen() in memory at a time (along with the numpy array).

1 Comment

Thanks for the input! This is indeed a bit slower than the for loop, but it works well for my purposes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.