2
\$\begingroup\$

The "vectorizing" of fancy indexing by Python's NumPy library sometimes gives unexpected results.

For example:

import numpy
a = numpy.zeros((1000,4), dtype='uint32')
b = numpy.zeros((1000,4), dtype='uint32')
i = numpy.random.random_integers(0,999,1000)
j = numpy.random.random_integers(0,3,1000)

a[i,j] += 1
for k in xrange(1000):
    b[i[k],j[k]] += 1

It gives different results in the arrays 'a' and 'b' (i.e. the appearance of tuple (i,j) appears as 1 in 'a' regardless of repeats, whereas repeats are counted in 'b').

This is easily verified as follows:

numpy.sum(a)
883
numpy.sum(b)
1000

It is also notable that the fancy indexing version is almost two orders of magnitude faster than the for loop.

Is there an efficient way for NumPy to compute the repeat counts as implemented using the for loop in the provided example?

\$\endgroup\$
6
  • \$\begingroup\$ I'm not quite sure I understand the question. Wouldn't that always be the length of i and j? Is there a case where it isn't? \$\endgroup\$ Commented Jun 12, 2012 at 16:39
  • \$\begingroup\$ Nope. In the fancy indexing case, a[i,j] += 1, any repeated (i,j) combinations will only increment a[i,j] once. It is as if the right-hand side is evaluated all at once, a[:] + 1, where all values in 'a' are 0, and the result is saved back at once setting some values in 'a' to 1. In the for-loop case, b[i[k],j[k]] += 1, repeated (i,j) pairs each increment their count by 1. \$\endgroup\$ Commented Jun 12, 2012 at 17:06
  • \$\begingroup\$ I see that the two indexing methods yield different results, but wouldn't the for-loop indexing always result in numpy.sum(b) == j.size? Because for each element in i, j some element of b is incremented once. \$\endgroup\$ Commented Jun 12, 2012 at 18:10
  • \$\begingroup\$ Yes. The for-loop version of b always adds up to j.size. However, the result I need is the final counts in b. Is there a fast way to compute b? \$\endgroup\$ Commented Jun 12, 2012 at 18:42
  • \$\begingroup\$ Ah, now I understand. You're not interested in sum(b), but in b itself. Me blockhead today. \$\endgroup\$ Commented Jun 12, 2012 at 18:45

2 Answers 2

1
\$\begingroup\$

Ben,

The following modification to your code is a more accurate reflection of my larger problem:



    from timeit import timeit
    import numpy

    # Dimensions
    ROWS = 10000000
    SAMPLES = 10000
    COLS = 4

    # Number of times to run timeit
    TEST_COUNT = 100 

    # Matrices
    a = numpy.zeros((ROWS, COLS), dtype='uint32')
    b = numpy.zeros((ROWS, COLS), dtype='uint32')
    c = numpy.zeros((ROWS, COLS), dtype='uint32')

    # Indices
    i = numpy.random.random_integers(0, ROWS - 1, SAMPLES)
    j = numpy.random.random_integers(0, COLS - 1, SAMPLES)

    # Fancy
    def fancy():
        a[i,j] += 1
    print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)

    # Loop
    def loop():
        for k in xrange(SAMPLES):
            b[i[k],j[k]] += 1
    print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)

    # Flat
    def flat():
        flatIndices = i*COLS + j
        sums = numpy.bincount(flatIndices)
        c.flat[:len(sums)] += sums

    print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)

    # Check that for_loop and custom are equivalent
    print "Equivalent:", (c == b).all()

With the exception that there are probably few samples that are actually repeated in this version. Let's ignore that difference for now. On my machine, your code gave the following timings:



    Fancy: 0.109203100204
    Loop: 7.37937998772
    Flat: 126.571173906
    Equivalent: True

This slowdown is largely attributable to the large sparse array output by bincount().

\$\endgroup\$
1
  • \$\begingroup\$ See the second edit in my post. I'm off to bed. \$\endgroup\$ Commented Jun 12, 2012 at 21:50
1
\$\begingroup\$

This is what I came up with (see the flat function at the bottom):

from timeit import timeit
import numpy

# Dimensions
ROWS = 1000
COLS = 4

# Number of times to run timeit
TEST_COUNT = 100 

# Matrices
a = numpy.zeros((ROWS, COLS), dtype='uint32')
b = numpy.zeros((ROWS, COLS), dtype='uint32')
c = numpy.zeros((ROWS, COLS), dtype='uint32')

# Indices
i = numpy.random.random_integers(0, ROWS - 1, ROWS)
j = numpy.random.random_integers(0, COLS - 1, ROWS)

# Fancy
def fancy():
    a[i,j] += 1
print "Fancy:", timeit("fancy()", setup="from __main__ import fancy", number=TEST_COUNT)

# Loop
def loop():
    for k in xrange(1000):
        b[i[k],j[k]] += 1
print "Loop:", timeit("loop()", setup="from __main__ import loop", number=TEST_COUNT)

# Flat
def flat():
    flatIndices = i*COLS + j
    sums = numpy.bincount(flatIndices)
    c.flat[:len(sums)] += sums

print "Flat:", timeit("flat()", setup="from __main__ import flat", number=TEST_COUNT)

# Check that for_loop and custom are equivalent
print "Equivalent:", (c == b).all()

On my machine, that prints:

Fancy: 0.0117284889374
Loop: 0.937140445391
Flat: 0.0165873246295
Equivalent: True

So the flat version is about 50% slower than fancy, but much faster than loop. It does take up some memory for the flatIndices and sums, though. Note that c.flat does not create a copy of c, it only provides a linear interface to the array.

Edit: Out of curiousity, I ran another test with COLS = 1e6 with the following results:

Fancy: 24.6063425241
Loop: N/A
Flat: 22.4316268267

Memory usage according to the Windows task manager (not the most accurate measuring tool, I know) was at about 80 MB base-line, spiking up to 130 MB during flat().

Edit Nr 2: Another attempt with your benchmark:

def flat():
    flatIndices = i*COLS + j
    uniqueIndices = numpy.unique(flatIndices)
    bins = numpy.append(uniqueIndices, uniqueIndices.max() + 1)
    histo = numpy.histogram(flatIndices, bins)
    c.flat[uniqueIndices] += histo[0]

This gives me

Fancy: 0.143004997089
Loop: 9.48711325557
Flat: 0.474580977267
Equivalent: True

This looks much better. Unfortunately, I couldn't find a way to avoid the bins array since numpy.histogram treats the last bin as a closed interval. So if you only pass uniqueIndices as bins, histogram will put both the last and second to last index in the same bin, which is usually wrong.

\$\endgroup\$
0

You must log in to answer this question.