1

I need to populate a numpy array, execution speed is important to me. The array will use a dictionary which will specify how many instances (indexed key value) of the array value (key value) I need.

The script below is my attempt, testing shows it takes 0.14 seconds to run but if I remove the hstack it runs in 0.004 s. So I conclude that it is the concatenating of the array that is taking the time. What's a better method?

Note the dictionary below is just a test case, in general I will have about a 100 different values and each value will repeat approximately 10,000 times.

td = {}
for ii in range(100):
    td[ii] = 10000+ii
a = np.ones(0)
for aa in td:
    a = np.hstack((a,np.ones(td[aa])*aa))

2 Answers 2

3

It's almost another 10x faster (than Josh's solution) to just flat out preallocate your memory.

a = np.empty((sum(td.values(),)))
i=0
for k,v in td.iteritems():
    a[i:i+v]=k
    i +=v

Why mess around with intermediate storage when you have enough info at the start to size your array? (np.empty is a quick way to size an array without actually setting any values yet)

Sign up to request clarification or add additional context in comments.

Comments

1

This code does the same thing but takes 3 ms vs 200 ms on my machine:

td = {}
for ii in range(100):
    td[ii] = 10000+ii

a = np.hstack([np.ones(td[aa])*aa for aa in td])

It calls np.hstack once on a list of arrays rather than repeatedly joining. Also, note that the order that you iterate through a dictionary is not guaranteed to be in the same order of insertion (use an OrderedDict if you want that), so you should be careful.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.