Python numpy averaging

Question

Averaging a table like this is not a problem

table = [[1,2,3,0],[1,2,3,0],[1,2,3,4]]

You can

print numpy.average(table,axis=0)

But what if I have uneven sequences like:

table = [[1,2,3],[1,2,3],[1,2,3,4]]

Then the result should be:

1,2,3,4

As the element containing number 4 only occurs once. and 4/1 = 4. But numpy will not allow this.

ValueError: setting an array element with a sequence.

Where does your data come from, and why aren't the sub-lists the same length? — Karl Knechtel
– Karl Knechtel, Commented Nov 12, 2011 at 18:49
This probably isn't a good question, but -- do you have to use numpy? — Matt Fenwick
– Matt Fenwick, Commented Nov 12, 2011 at 19:01

unutbu · Accepted Answer · 2011-11-12 19:14:59Z

3

You could feed the data into a numpy masked array, then compute the means with np.ma.mean:

import numpy as np
import itertools
data=[[1,2,3],[1,2,3],[1,2,3,4]]

rows=len(data)
cols=max(len(row) for row in data)
arr=np.ma.zeros((rows,cols))
arr.mask=True
for i,row in enumerate(data):
    arr[i,:len(row)]=row

print(arr.mean(axis=0))

yields

[1.0 2.0 3.0 4.0]

Elements of the array get unmasked (i.e. arr.mask[i,j]=False) when a value is assigned. Note the resultant mask below:

In [162]: arr
Out[162]: 
masked_array(data =
 [[1.0 2.0 3.0 --]
 [1.0 2.0 3.0 --]
 [1.0 2.0 3.0 4.0]],
             mask =
 [[False False False  True]
 [False False False  True]
 [False False False False]],
       fill_value = 1e+20)

If your data is rather short, yosukesabai's method or a pure Python solution is likely to be faster than what I show above. Only invest in making a masked array if the data is very large and you have enough numpy operations to perform on the array to make the initial cost of setting up the array worth it.

edited Nov 12, 2011 at 19:14

answered Nov 12, 2011 at 19:07

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

unutbu Over a year ago

@yosukesabai: Thanks. I like your method too. Please undelete it :)

yosukesabai Over a year ago

undeleted. interesting to know about setup cost, which I often forget.

yosukesabai · Accepted Answer · 2011-11-12 19:09:29Z

2

The only workaround i can think of is to use itertools and temporary list, not very beautiful.

import numpy as np
from itertools import izip_longest
table = [[1,2,3],[1,2,3],[1,2,3,4]]

for row in izip_longest(*table):
    print np.average([x for x in row if x is not None])

This yields

1.0
2.0
3.0
4.0

answered Nov 12, 2011 at 19:09

yosukesabai

6,2644 gold badges35 silver badges42 bronze badges

Collectives™ on Stack Overflow

Python numpy averaging

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related