1

I am trying to convert the vanilla python standard deviation function that takes n number of indexes defined by the variable number for calculations into numpy form. However the numpy code is faulty which is saying only integer scalar arrays can be converted to a scalar index is there any way i could by pass this.

Variables

import numpy as np
number = 5
list_= np.array([457.334015,424.440002,394.795990,408.903992,398.821014,402.152008,435.790985,423.204987,411.574005,
404.424988,399.519989,377.181000,375.467010,386.944000,383.614990,375.071991,359.511993,328.865997,
320.510010,330.079010,336.187012,352.940002,365.026001,361.562012,362.299011,378.549011,390.414001,
400.869995,394.773010,382.556000])

Vanilla python

std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number)])

Numpy form

counter = np.arange(0, len(list_)-number, 1)
std = list_[counter:counter+number].std()
1
  • You cannot uses a numpy array (result of arange) as a slice start or stop. arr[1:10] is ok, arr[np.array([1,2,3]: np.array([4,5,6]) is not! What were you hoping it would produce? Commented Jan 12, 2021 at 23:29

2 Answers 2

1
In [46]: std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-number)
    ...: ])
In [47]: std
Out[47]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

We can move the std out of the loop. Make a 2d array of windows, and apply std with axis:

In [48]: np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).std(axis
    ...: =1)
Out[48]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

We could also generate the windows with indexing. A convenient way is to use linspace:

In [63]: idx = np.arange(0,len(arr)-number)
In [64]: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
In [65]: idx
Out[65]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24],
         ...
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
        20, 21, 22, 23, 24, 25, 26, 27, 28]])
In [66]: arr[idx].std(axis=0)
Out[66]: 
array([22.67653383, 10.3940773 , 14.60076482, 13.82801944, 13.68038469,
       12.54834004, 13.13574418, 15.24698722, 14.65383773, 11.62092989,
        8.57331689,  4.76392583,  9.49404494, 21.20874383, 24.91417226,
       20.84991841, 13.22152789, 10.83343482, 16.01294245, 13.80007894,
       10.51866421,  8.29287433, 11.24933733, 15.43661128, 13.65945978])

The rolling-windows using as_strided will probably be faster, but may be harder to understand.

In [67]: timeit std= np.array([arr[i:i+number].std() for i in range(0, len(arr)-
    ...: number)])
1.05 ms ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [68]: timeit np.array([arr[i:i+number] for i in range(0, len(arr)-number)]).s
    ...: td(axis=1)
74.7 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [69]: %%timeit
    ...: idx = np.arange(0,len(arr)-number)
    ...: idx = np.linspace(idx,idx+number,number, endpoint=False,dtype=int)
    ...: arr[idx].std(axis=0)
117 µs ± 240 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [73]: timeit np.std(rolling_window(arr, 5), 1)
74.5 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

using a more direct way to generate the rolling index:

In [81]: %%timeit
    ...: idx = np.arange(len(arr)-number)[:,None]+np.arange(number)
    ...: arr[idx].std(axis=1)
57.9 µs ± 87.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

your error

In [82]: arr[np.array([1,2,3]):np.array([4,5,6])]
Traceback (most recent call last):
  File "<ipython-input-82-3358e59f8fb5>", line 1, in <module>
    arr[np.array([1,2,3]):np.array([4,5,6])]
TypeError: only integer scalar arrays can be converted to a scalar index
Sign up to request clarification or add additional context in comments.

5 Comments

thanks for the detailed explanation isnt there a where I can avoid the for loop I am trying to make my code run faster.
My bad I didnt mean to come off in a disrespectful way. I just dont rerally understand the asstrided section
Simply moving std out of the loop gave better than 10x improvement. The as_strided version isn't fastest, so if you don't understand it, don't worry. The question of how to take multiple slices comes up fairly often. A simple slice is fast (a view), but multiple ones requires some sort of copy - advanced indexing or concatenate.
Yea I thought I could implement the last one that takes 57.9 µs to complete for my program that uses a very long list with the length of 2 million plus and it crashed with the error message of Unable to allocate 14.2 GiB for an array with shape (2640651, 1440) and data type int32. but thanks anyways man.
Yes, for larger arrays, iteration as you initially did may be necessary. Even if you don't have memory errors, there are time tradeoffs between memory management and iteration. You could also look into using numba to compile the task.
0

as taken from Rolling window for 1D arrays in Numpy?

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

np.std(rolling_window(list_, 5), 1)

by the way, your vanilla python code is wrong. it should be:

std= np.array([list_[i:i+number].std() for i in range(0, len(list_)-number+1)])

2 Comments

I am trying to run my code faster is there a way i could make the function without a for loop.
use the code I wrote in the first box. It should give the results you need without a for loop.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.