Numpy: Efficient access to sub-arrays generated by numpy.split

Question

I have the following code that generates a list of sub-arrays based on the split function. Here, I just compare the first value of each tuple and based on the difference I generate the sub-arrays. So far so good.

import numpy as np

f = np.genfromtxt("d_n_isogro_ms.txt", names=True, dtype=None, usecols=(1,-1))

dm  = np.absolute(np.diff(f['mz']))
pos = np.where(dm > 2)[0] + 1
fsplit = np.array_split(f, pos)

This is how the sample input looks like (only an excerpt):

[(270.0332, 472) (271.0376, 1936) (272.0443, 11188) (273.0495, 65874)
 (274.0517, 8582) (275.0485, 4081) (276.0523, 659) (286.058, 1078)
 (287.0624, 4927) (288.0696, 22481) (289.0757, 84001) (290.078, 13688)
 (291.0746, 5402) (430.1533, 13995) (431.1577, 2992) (432.1685, 504)]
<type 'numpy.ndarray'>

The position for this particular data is then computed as:

pos = [7,12]

And here is my sample output:

[array([(270.0332, 472), (271.0376, 1936), (272.0443, 11188),
       (273.0495, 65874), (274.0517, 8582), (275.0485, 4081),
       (276.0523, 659)], dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(286.058, 1078), (287.0624, 4927), (288.0696, 22481),
   (289.0757, 84001), (290.078, 13688), (291.0746, 5402)], 
  dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(430.1533, 13995),
   (431.1577, 2992), (432.1685, 504)], 
  dtype=[('mz', '<f8'), ('I', '<i8')])]

I would like to perform the weighted average on each of the arrays. Is there an efficient way of doing this with numpy? I basically fail with the indexing. Preferably, I would like to use the dtype to identify weights and numbers.

Maybe one could do the whole operation on the fly

Thank you very much for your help in advance.

Best, Christian

Do you have a corresponding pos as well for the sample input, output that you just posted? pos could be useful I think. — Divakar
– Divakar, Commented Aug 26, 2015 at 11:14
pos might be [7,13] instead for np.array_split(f, pos) to give the expected output? Also, as the final output, you are looking to have weighted average, right? So, what must that be for given sample inputs? — Divakar
– Divakar, Commented Aug 26, 2015 at 11:21

user2379410 · Accepted Answer · 2015-08-26 11:39:57Z

2

The output of np.array_split is a Python list containing arrays of unequal lenghts. The best you can do in that case is a Python loop:

result = [np.average(f_i['mz'], weights=f_i['I']) for f_i in fsplit]

But it is possible to come up with a completely vectorized solution, by using add.reduceat instead of array_split:

dm = np.abs(np.diff(f['mz']))
pos = np.flatnonzero(np.r_[True, dm > 2])

totals = np.add.reduceat(f['mz']*f['I'], pos)
counts = np.add.reduceat(f['I'], pos)
result = totals / counts

answered Aug 26, 2015 at 11:39

user2379410

Sign up to request clarification or add additional context in comments.

1 Comment

Christian Opitz Over a year ago

Thank you very much. First of, it works. Second I like the vectorized solution and will use it for future scripts.

Collectives™ on Stack Overflow

Numpy: Efficient access to sub-arrays generated by numpy.split

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related