2

Averaging a table like this is not a problem

table = [[1,2,3,0],[1,2,3,0],[1,2,3,4]]

You can

print numpy.average(table,axis=0)

But what if I have uneven sequences like:

table = [[1,2,3],[1,2,3],[1,2,3,4]]

Then the result should be:

1,2,3,4

As the element containing number 4 only occurs once. and 4/1 = 4. But numpy will not allow this.

ValueError: setting an array element with a sequence.

3
  • 1
    Where does your data come from, and why aren't the sub-lists the same length? Commented Nov 12, 2011 at 18:49
  • genomic data, different gene lengths Commented Nov 12, 2011 at 18:52
  • This probably isn't a good question, but -- do you have to use numpy? Commented Nov 12, 2011 at 19:01

2 Answers 2

3

You could feed the data into a numpy masked array, then compute the means with np.ma.mean:

import numpy as np
import itertools
data=[[1,2,3],[1,2,3],[1,2,3,4]]

rows=len(data)
cols=max(len(row) for row in data)
arr=np.ma.zeros((rows,cols))
arr.mask=True
for i,row in enumerate(data):
    arr[i,:len(row)]=row

print(arr.mean(axis=0))

yields

[1.0 2.0 3.0 4.0]

Elements of the array get unmasked (i.e. arr.mask[i,j]=False) when a value is assigned. Note the resultant mask below:

In [162]: arr
Out[162]: 
masked_array(data =
 [[1.0 2.0 3.0 --]
 [1.0 2.0 3.0 --]
 [1.0 2.0 3.0 4.0]],
             mask =
 [[False False False  True]
 [False False False  True]
 [False False False False]],
       fill_value = 1e+20)

If your data is rather short, yosukesabai's method or a pure Python solution is likely to be faster than what I show above. Only invest in making a masked array if the data is very large and you have enough numpy operations to perform on the array to make the initial cost of setting up the array worth it.

Sign up to request clarification or add additional context in comments.

2 Comments

@yosukesabai: Thanks. I like your method too. Please undelete it :)
undeleted. interesting to know about setup cost, which I often forget.
2

The only workaround i can think of is to use itertools and temporary list, not very beautiful.

import numpy as np
from itertools import izip_longest
table = [[1,2,3],[1,2,3],[1,2,3,4]]

for row in izip_longest(*table):
    print np.average([x for x in row if x is not None])

This yields

1.0
2.0
3.0
4.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.