2

I have the following code that generates a list of sub-arrays based on the split function. Here, I just compare the first value of each tuple and based on the difference I generate the sub-arrays. So far so good.

import numpy as np

f = np.genfromtxt("d_n_isogro_ms.txt", names=True, dtype=None, usecols=(1,-1))

dm  = np.absolute(np.diff(f['mz']))
pos = np.where(dm > 2)[0] + 1
fsplit = np.array_split(f, pos)

This is how the sample input looks like (only an excerpt):

[(270.0332, 472) (271.0376, 1936) (272.0443, 11188) (273.0495, 65874)
 (274.0517, 8582) (275.0485, 4081) (276.0523, 659) (286.058, 1078)
 (287.0624, 4927) (288.0696, 22481) (289.0757, 84001) (290.078, 13688)
 (291.0746, 5402) (430.1533, 13995) (431.1577, 2992) (432.1685, 504)]
<type 'numpy.ndarray'>

The position for this particular data is then computed as:

pos = [7,12]

And here is my sample output:

[array([(270.0332, 472), (271.0376, 1936), (272.0443, 11188),
       (273.0495, 65874), (274.0517, 8582), (275.0485, 4081),
       (276.0523, 659)], dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(286.058, 1078), (287.0624, 4927), (288.0696, 22481),
   (289.0757, 84001), (290.078, 13688), (291.0746, 5402)], 
  dtype=[('mz', '<f8'), ('I', '<i8')]),
array([(430.1533, 13995),
   (431.1577, 2992), (432.1685, 504)], 
  dtype=[('mz', '<f8'), ('I', '<i8')])]

I would like to perform the weighted average on each of the arrays. Is there an efficient way of doing this with numpy? I basically fail with the indexing. Preferably, I would like to use the dtype to identify weights and numbers.

Maybe one could do the whole operation on the fly

Thank you very much for your help in advance.

Best, Christian

4
  • Sample input, output as mcve might help us help you. Commented Aug 26, 2015 at 11:05
  • Thank. Sorry I totally forgot. Commented Aug 26, 2015 at 11:06
  • Do you have a corresponding pos as well for the sample input, output that you just posted? pos could be useful I think. Commented Aug 26, 2015 at 11:14
  • pos might be [7,13] instead for np.array_split(f, pos) to give the expected output? Also, as the final output, you are looking to have weighted average, right? So, what must that be for given sample inputs? Commented Aug 26, 2015 at 11:21

1 Answer 1

2

The output of np.array_split is a Python list containing arrays of unequal lenghts. The best you can do in that case is a Python loop:

result = [np.average(f_i['mz'], weights=f_i['I']) for f_i in fsplit]

But it is possible to come up with a completely vectorized solution, by using add.reduceat instead of array_split:

dm = np.abs(np.diff(f['mz']))
pos = np.flatnonzero(np.r_[True, dm > 2])

totals = np.add.reduceat(f['mz']*f['I'], pos)
counts = np.add.reduceat(f['I'], pos)
result = totals / counts
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. First of, it works. Second I like the vectorized solution and will use it for future scripts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.