1

I'm trying to get the average row/col position using the array values as the weights. This seems to work, but just feels wrong:

a = numpy.random.ranf(size=(5,5))
normalized_a = a/numpy.nansum(a)

row_values = []
col_values = []
for row, col in numpy.ndindex(normalized_a.shape):
    weight = int(normalized_a[row, col] * 100)
    row_values.extend([row] * weight)
    col_values.extend([col] * weight)

print "average row:", sum(row_values)/float(len(row_values))
print "average col:", sum(col_values)/float(len(col_values))

Is there a more efficient way to do this in numpy?

2 Answers 2

2

A basic insight to speed up your calculation is that, since when doing row (column) calculations, all items in a same column (row) get multiplied by the same value, it will be faster to add them together, then multiply the result by the row (column) number. If your array is m x n, that reduces the number of multiplications you have to do from 2 * m * n to m + n. And since you are doing multiplications and additions, you can use np.dot to try to scratch the last bit of performance. Building on @mgilson's tests:

def new3(normlized_a):
    weights  = numpy.floor(normalized_a * 100)
    total_wt = np.sum(weights)
    rows, cols = weights.shape
    row_values = np.dot(weights.sum(axis=1), np.arange(rows)) / total_wt
    col_values = np.dot(weights.sum(axis=0), np.arange(cols)) / total_wt
    return row_values, col_values

And these are my results and timings:

(1.8352941176470587, 2.388235294117647)
(1.8352941176470587, 2.388235294117647)
(1.8352941176470587, 2.388235294117647)
(1.8352941176470587, 2.388235294117647)
timing!!!
2.59478258085
1.33357909978
1.0771122333
0.487124971828 #new3
Sign up to request clarification or add additional context in comments.

3 Comments

Well done. +1 from me -- Although, I must say, I'm a bit disappointed. I thought my answer was pretty good until I saw this one ...
Seems this question only brings grief too respondents, for I am not very happy either: "Dear Santa, I've been a very good boy this year, please bring me a PC like @mgilson's, that runs twice as fast as mine..." ;-)
Mine's a linux machine courtesy of my employer :)
1

These seem to be a bit better:

import numpy

a = numpy.random.ranf(size=(5,6))
normalized_a = a/numpy.nansum(a)

def original(a, normalized_a):
  row_values = []
  col_values = []
  for row, col in numpy.ndindex(normalized_a.shape):
    weight = int(normalized_a[row, col] * 100)
    row_values.extend([row] * weight)
    col_values.extend([col] * weight)

  return sum(row_values)/float(len(row_values)), sum(col_values)/float(len(col_values))


def new(a, normalized_a):
  weights = numpy.floor(normalized_a * 100)
  nx, ny = a.shape
  rows, columns = numpy.mgrid[:nx, :ny]
  row_values = numpy.sum(rows * weights)/numpy.sum(weights)
  col_values = numpy.sum(columns * weights)/numpy.sum(weights)
  return row_values, col_values


def new2(a, normalized_a):
  weights = numpy.floor(normalized_a * 100)
  nx, ny = a.shape
  rows, columns = numpy.ogrid[:nx, :ny]
  row_values = numpy.sum(rows * weights)/numpy.sum(weights)
  col_values = numpy.sum(columns * weights)/numpy.sum(weights)
  return row_values, col_values


print original(a, normalized_a)
print new(a, normalized_a)
print new2(a, normalized_a)


print "timing!!!"

import timeit
print timeit.timeit('original(a, normalized_a)', 'from __main__ import original, a, normalized_a', number=10000)
print timeit.timeit('new(a, normalized_a)', 'from __main__ import new, a, normalized_a', number=10000)
print timeit.timeit('new2(a, normalized_a)', 'from __main__ import new2, a, normalized_a', number=10000)

The results on my computer:

(1.8928571428571428, 2.630952380952381)
(1.8928571428571428, 2.6309523809523809)
(1.8928571428571428, 2.6309523809523809)
timing!!!
1.05751299858
0.64871096611
0.497050046921

I used some of numpy's index tricks to vectorize the computation. I'm actually a little surprised that we didn't do better. np.ogrid is only about twice as fast as the original on your test matrix. np.mgrid falls somewhere in between.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.