Calculate moving average in numpy array with NaNs

Question

I am trying to calculate the moving average in a large numpy array that contains NaNs. Currently I am using:

import numpy as np

def moving_average(a,n=5):
      ret = np.cumsum(a,dtype=float)
      ret[n:] = ret[n:]-ret[:-n]
      return ret[-1:]/n

When calculating with a masked array:

x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx).filled(np.nan)

print y

>>> array([3.8,3.8,3.6,nan,nan,nan,2,2.4,nan,nan,nan,2.8,2.6])

The result I am looking for (below) should ideally have NaNs only in the place where the original array, x, had NaNs and the averaging should be done over the number of non-NaN elements in the grouping (I need some way to change the size of n in the function.)

y = array([4.75,4.75,nan,4.4,3.75,2.33,3.33,4,nan,nan,3,3.5,nan,3.25,4,4.5,3])

I could loop over the entire array and check index by index but the array I am using is very large and that would take a long time. Is there a numpythonic way to do this?

So, is that [4.75,4.75,nan,4.4,3.75,2.33,3.33,4,nan,nan,3,3.5,nan,3.25] the expected output? If so, why is there a NaN as the third element? — Divakar
– Divakar, Commented Oct 7, 2016 at 14:12
@Divakar It is the expected output. In the original array (x), there is a nan as the third entry. — krakenwagon
– krakenwagon, Commented Oct 7, 2016 at 14:15
So why do we have a NaN as the second last entry in the expected output? — Divakar
– Divakar, Commented Oct 7, 2016 at 14:16
Edited it to show the remaining averages; forgot to add them sorry. — krakenwagon
– krakenwagon, Commented Oct 7, 2016 at 14:19
@Divakar the answer with the np.cumsum approach gave the fastest result with my actual data (changed the accepted answer.) All of the answers gave the result I wanted — krakenwagon
– krakenwagon, Commented Oct 7, 2016 at 17:54

slevin886 · Accepted Answer · 2019-04-24 17:27:18Z

2

Pandas has a lot of really nice functionality with this. For example:

x = np.array([np.nan, np.nan, 3, 3, 3, np.nan, 5, 7, 7])

# requires three valid values in a row or the resulting value is null

print(pd.Series(x).rolling(3).mean())

#output
nan,nan,nan, nan, 3, nan, nan, nan, 6.333

# only requires 2 valid values out of three for size=3 window

print(pd.Series(x).rolling(3, min_periods=2).mean())

#output
nan, nan, nan, 3, 3, 3, 4, 6, 6.3333

You can play around with the windows/min_periods and consider filling-in nulls all in one chained line of code.

answered Apr 24, 2019 at 17:27

slevin886

3012 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jerry · Accepted Answer · 2016-10-07 18:32:46Z

1

I'll just add to the great answers before that you could still use cumsum to achieve this:

import numpy as np

def moving_average(a, n=5):
    ret = np.cumsum(a.filled(0))
    ret[n:] = ret[n:] - ret[:-n]
    counts = np.cumsum(~a.mask)
    counts[n:] = counts[n:] - counts[:-n]
    ret[~a.mask] /= counts[~a.mask]
    ret[a.mask] = np.nan

    return ret

x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx)

edited Oct 7, 2016 at 18:32

answered Oct 7, 2016 at 15:15

jerry

5591 gold badge4 silver badges10 bronze badges

Comments

P. Camilleri · Accepted Answer · 2016-10-07 14:19:02Z

0

You could create a temporary array and use np.nanmean() (new in version 1.8 if I'm not mistaken):

import numpy as np
temp = np.vstack([x[i:-(5-i)] for i in range(5)]) # stacks vertically the strided arrays
means = np.nanmean(temp, axis=0)

and put original nan back in place with means[np.isnan(x[:-5])] = np.nan

However this look redundant both in terms of memory (stacking the same array strided 5 times) and computation.

edited Oct 7, 2016 at 14:19

answered Oct 7, 2016 at 14:07

P. Camilleri

13.3k10 gold badges49 silver badges85 bronze badges

2 Comments

krakenwagon Over a year ago

np.nanmean() does not return nan anywhere in the output array.

P. Camilleri Over a year ago

@krakenwagon, yes, you add them back with the line I edited right before your comment.

N1B4 · Accepted Answer · 2016-10-07 14:26:02Z

0

If I understand correctly, you want to create a moving average and then populate the resulting elements as nan if their index in the original array was nan.

import numpy as np

>>> inc = 5 #the moving avg increment 

>>> x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
>>> mov_avg = np.array([np.nanmean(x[idx:idx+inc]) for idx in range(len(x))])

# Determine indices in x that are nans 
>>> nan_idxs = np.where(np.isnan(x))[0]

# Populate output array with nans
>>> mov_avg[nan_idxs] = np.nan
>>> mov_avg
array([ 4.75, 4.75, nan, 4.4, 3.75, 2.33333333, 3.33333333, 4., nan, nan, 3., 3.5, nan, 3.25, 4., 4.5, 3.])

answered Oct 7, 2016 at 14:26

N1B4

3,4631 gold badge23 silver badges25 bronze badges

Comments

Divakar · Accepted Answer · 2016-10-07 14:29:39Z

0

Here's an approach using strides -

w = 5 # Window size
n = x.strides[0]      
avgs = np.nanmean(np.lib.stride_tricks.as_strided(x, \
                        shape=(x.size-w+1,w), strides=(n,n)),1)

x_rem = np.append(x[-w+1:],np.full(w-1,np.nan))
avgs_rem = np.nanmean(np.lib.stride_tricks.as_strided(x_rem, \
                               shape=(w-1,w), strides=(n,n)),1)
avgs = np.append(avgs,avgs_rem)                               
avgs[np.isnan(x)] = np.nan

answered Oct 7, 2016 at 14:29

Divakar

222k19 gold badges273 silver badges374 bronze badges

Comments

Roux · Accepted Answer · 2021-08-19 11:38:26Z

0

Currently bottleneck package should do the trick quite reliably and quickly. Here is slightly adjusted example from https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.move_mean:

>>> import bottleneck as bn
>>> a = np.array([1.0, 2.0, 3.0, np.nan, 5.0])
>>> bn.move_mean(a, window=2)
array([ nan,  1.5,  2.5,  nan,  nan])
>>> bn.move_mean(a, window=2, min_count=1)
array([ 1. ,  1.5,  2.5,  3. ,  5. ])

Note that the resulting means correspond to the last index of the window.

The package is available from Ubuntu repos, pip etc. It can operate over arbitrary axis of numpy-array etc. Besides that, it is claimed to be faster than plain-numpy implementation in many cases.

answered Aug 19, 2021 at 11:38

Roux

4754 silver badges12 bronze badges

Collectives™ on Stack Overflow

Calculate moving average in numpy array with NaNs

6 Answers 6

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related