This is what I came up with (see the flat function at the bottom):
from timeit import timeit
import numpy
# Dimensions
ROWS = 1000
COLS = 4
# Number of times to run timeit
TEST_COUNT = 100
# Matrices
a = numpy.zeros((ROWS, COLS), dtype='uint32')
b = numpy.zeros((ROWS, COLS), dtype='uint32')
c = numpy.zeros((ROWS, COLS), dtype='uint32')
# Indices
i = numpy.random.random_integers(0, ROWS - 1, ROWS)
j = numpy.random.random_integers(0, COLS - 1, ROWS)
# Fancy
def fancy():
a[i,j] += 1
print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)
# Loop
def loop():
for k in xrange(1000):
b[i[k],j[k]] += 1
print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)
# Flat
def flat():
flatIndices = i*COLS + j
sums = numpy.bincount(flatIndices)
c.flat[:len(sums)] += sums
print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)
# Check that for_loop and custom are equivalent
print "Equivalent:", (c == b).all()
On my machine, that prints:
Fancy: 0.0117284889374
Loop: 0.937140445391
Flat: 0.0165873246295
Equivalent: True
So the flat version is about 50% slower than fancy, but much faster than loop. It does take up some memory for the flatIndices and sums, though. Note that c.flat does not create a copy of c, it only provides a linear interface to the array.
Edit: Out of curiousity, I ran another test with COLS = 1e6 with the following results:
Fancy: 24.6063425241
Loop: N/A
Flat: 22.4316268267
Memory usage according to the Windows task manager (not the most accurate measuring tool, I know) was at about 80 MB base-line, spiking up to 130 MB during flat().
Edit Nr 2: Another attempt with your benchmark:
def flat():
flatIndices = i*COLS + j
uniqueIndices = numpy.unique(flatIndices)
bins = numpy.append(uniqueIndices, uniqueIndices.max() + 1)
histo = numpy.histogram(flatIndices, bins)
c.flat[uniqueIndices] += histo[0]
This gives me
Fancy: 0.143004997089
Loop: 9.48711325557
Flat: 0.474580977267
Equivalent: True
This looks much better. Unfortunately, I couldn't find a way to avoid the bins array since numpy.histogram treats the last bin as a closed interval. So if you only pass uniqueIndices as bins, histogram will put both the last and second to last index in the same bin, which is usually wrong.
numpy.sum(b) == j.size? Because for each element ini, jsome element ofbis incremented once. \$\endgroup\$balways adds up toj.size. However, the result I need is the final counts inb. Is there a fast way to computeb? \$\endgroup\$sum(b), but inbitself. Me blockhead today. \$\endgroup\$