Finding moving average from data points in Python

Question

I am playing in Python a bit again, and I found a neat book with examples. One of the examples is to plot some data. I have a .txt file with two columns and I have the data. I plotted the data just fine, but in the exercise it says: Modify your program further to calculate and plot the running average of the data, defined by:

$Y_k=\frac{1}{2r}\sum_{m=-r}^r y_{k+m}$

where r=5 in this case (and the y_k is the second column in the data file). Have the program plot both the original data and the running average on the same graph.

So far I have this:

from pylab import plot, ylim, xlim, show, xlabel, ylabel
from numpy import linspace, loadtxt

data = loadtxt("sunspots.txt", float)
r=5.0

x = data[:,0]
y = data[:,1]

plot(x,y)
xlim(0,1000)
xlabel("Months since Jan 1749.")
ylabel("No. of Sun spots")
show()

So how do I calculate the sum? In Mathematica it's simple since it's symbolic manipulation (Sum[i, {i,0,10}] for example), but how to calculate sum in python which takes every ten points in the data and averages it, and does so until the end of points?

I looked at the book, but found nothing that would explain this :\

heltonbiker's code did the trick ^^ :D

from __future__ import division
from pylab import plot, ylim, xlim, show, xlabel, ylabel, grid
from numpy import linspace, loadtxt, ones, convolve
import numpy as numpy

data = loadtxt("sunspots.txt", float)

def movingaverage(interval, window_size):
    window= numpy.ones(int(window_size))/float(window_size)
    return numpy.convolve(interval, window, 'same')

x = data[:,0]
y = data[:,1]


plot(x,y,"k.")
y_av = movingaverage(y, 10)
plot(x, y_av,"r")
xlim(0,1000)
xlabel("Months since Jan 1749.")
ylabel("No. of Sun spots")
grid(True)
show()

And I got this:

Thank you very much ^^ :)

That's weird. Since we don't have your txt file, it's not possible to test here, but I think the xlim line should not be used (just in case) — heltonbiker
– heltonbiker, Commented Jul 5, 2012 at 21:11
I got the points from here: www-personal.umich.edu/~mejn/computational-physics/sunspots.dat And removing xlim didn't help :\ — dingo_d
– dingo_d, Commented Jul 5, 2012 at 21:14
I made a mistake in the code! you have to perform the average on the y array, not x: y_av = movingaverage(y, r) plot(x, y_av). And you can use xlim again, I think. — heltonbiker
– heltonbiker, Commented Jul 5, 2012 at 21:20
I think we need to use "valid" instead of "same" here - return numpy.convolve(interval, window, 'same') — ekta
– ekta, Commented Oct 29, 2014 at 4:12

Roman Kh · Accepted Answer · 2015-12-21 02:00:15Z

116

As numpy.convolve is pretty slow, those who need a fast performing solution might prefer an easier to understand cumsum approach. Here is the code:

cumsum_vec = numpy.cumsum(numpy.insert(data, 0, 0)) 
ma_vec = (cumsum_vec[window_width:] - cumsum_vec[:-window_width]) / window_width

where data contains your data, and ma_vec will contain moving averages of window_width length.

On average, cumsum is about 30-40 times faster than convolve.

answered Dec 21, 2015 at 2:00

Roman Kh

2,7652 gold badges20 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Petr Fedosov Over a year ago

where is the 'step' parameter?

Gabriel Over a year ago

This is a duplicate of this older question:stackoverflow.com/a/27681394/1391441

scrx2 Over a year ago

why the numpy.insert(data, 0, 0)? It adds a single 0 at the beginning of data, right?

Adrien Mau Over a year ago

The insert at the beginning only adds 0 yes. I'm not sure what exactly is its purpose. If you want to pad your array to keep a similar array size you may use this king of thing. Typically : data = np.pad( data, int(wsize/2) , mode='edge') will allow to duplicate the edge value and not get a lower value than originally being in your data. If wsize is odd adding a supplementary y[0] at the beginning of data is then useful.

starball · Accepted Answer · 2022-12-24 06:19:07Z

92

Before reading this answer, bear in mind that there is another answer below, from Roman Kh, which uses numpy.cumsum and is MUCH MUCH FASTER than this one.

~~Best~~ One common way to apply a moving/sliding average (or any other sliding window function) to a signal is by using numpy.convolve().

def movingaverage(interval, window_size):
    window = numpy.ones(int(window_size))/float(window_size)
    return numpy.convolve(interval, window, 'same')

Here, interval is your x array, and window_size is the number of samples to consider. The window will be centered on each sample, so it takes samples before and after the current sample in order to calculate the average. Your code would become:

plot(x,y)
xlim(0,1000)

x_av = movingaverage(interval, r)
plot(x_av, y)

xlabel("Months since Jan 1749.")
ylabel("No. of Sun spots")
show()

edited Dec 24, 2022 at 6:19

starball♦

59.5k52 gold badges312 silver badges1k bronze badges

answered Jul 5, 2012 at 20:37

heltonbiker

27.7k30 gold badges151 silver badges270 bronze badges

6 Comments

dingo_d Over a year ago

Here I get error: Traceback (most recent call last): File "C:/Users/*****/Desktop/sunspots_plot.py", line 18, in <module> x_av = movingaverage(x, 5) File "C:/Users/*****/Desktop/sunspots_plot.py", line 8, in movingaverage window= numpy.ones(int(window_size))/float(window_size) NameError: global name 'numpy' is not defined

heltonbiker Over a year ago

Well, that means you didn't import numpy. In fact, you imported just some functions from it: linspace and loadtxt. You should add ones and convolve to that ;o)

dingo_d Over a year ago

I edited my code and now I have the image, but the average is only on last part of the graph, should I manually change interval to sort that out?

Roman Kh Over a year ago

The problem is that convolve is extremely slow. Below you may find a much faster solution based on numpy.cumsum().

Lee Over a year ago

I'm finding that this solution works very well, but does not work at the edges of the data. It adds spurious low values.

|

Andy Hayden · Accepted Answer · 2012-12-30 18:22:01Z

32

A moving average is a convolution, and numpy will be faster than most pure python operations. This will give you the 10 point moving average.

import numpy as np
smoothed = np.convolve(data, np.ones(10)/10)

I would also strongly suggest using the great pandas package if you are working with timeseries data. There are some nice moving average operations built in.

edited Dec 30, 2012 at 18:22

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

answered Jul 5, 2012 at 20:41

reptilicus

10.4k6 gold badges59 silver badges80 bronze badges

2 Comments

dingo_d Over a year ago

I get Error: Traceback (most recent call last): File "C:/Users/*****/Desktop/sunspots_plot.py", line 7, in <module> smoothed = np.convolve(data, np.ones(10)/(10)) File "C:\Python26\lib\site-packages\numpy\core\numeric.py", line 787, in convolve return multiarray.correlate(a, v[::-1], mode) ValueError: object too deep for desired array

reptilicus Over a year ago

Thats b/c data in your case is a multiple dimension numpy array, and you should be passing a one dimension array. In your case, it would be smoothed = np.convolve(y, np.ones/10)

user206545 · Accepted Answer · 2012-07-05 20:36:15Z

4

ravgs = [sum(data[i:i+5])/5. for i in range(len(data)-4)]

This isn't the most efficient approach but it will give your answer and I'm unclear if your window is 5 points or 10. If its 10, replace each 5 with 10 and the 4 with 9.

answered Jul 5, 2012 at 20:36

user206545

Comments

ekta · Accepted Answer · 2014-10-29 07:16:55Z

4

There is a problem with the accepted answer. I think we need to use "valid" instead of "same" here - return numpy.convolve(interval, window, 'same') .

As an Example try out the MA of this data-set = [1,5,7,2,6,7,8,2,2,7,8,3,7,3,7,3,15,6] - the result should be [4.2,5.4,6.0,5.0,5.0,5.2,5.4,4.4,5.4,5.6,5.6,4.6,7.0,6.8], but having "same" gives us an incorrect output of [2.6,3.0,4.2,5.4,6.0,5.0,5.0,5.2,5.4,4.4,5.4,5.6,5.6, 4.6,7.0,6.8,6.2,4.8]

Rusty code to try this out -:

result=[]
dataset=[1,5,7,2,6,7,8,2,2,7,8,3,7,3,7,3,15,6]
window_size=5
for index in xrange(len(dataset)):
    if index <=len(dataset)-window_size :
        tmp=(dataset[index]+ dataset[index+1]+ dataset[index+2]+ dataset[index+3]+ dataset[index+4])/5.0
        result.append(tmp)
    else:
      pass

result==movingaverage(y, window_size)

Try this with valid & same and see whether the math makes sense.

See also -: http://sentdex.com/sentiment-analysisbig-data-and-python-tutorials-algorithmic-trading/how-to-chart-stocks-and-forex-doing-your-own-financial-charting/calculate-simple-moving-average-sma-python/

edited Oct 29, 2014 at 7:16

answered Oct 29, 2014 at 4:27

ekta

1,6403 gold badges30 silver badges59 bronze badges

2 Comments

ekta Over a year ago

@dingo_d Why don't you quickly try this out with the rusty code (and the sample data-set(as a simple list), I posted ? For some lazy people(like I had been at first) - its masks out the fact that moving average is incorrect.Probably you should consider editing your original answer. I tried it just yesterday and double checking saved me face from looking bad at reporting to Cxo level. All you need to do, is to try your same moving average once with "valid" and other time with "same" - and once you are convinced give me some love(aka-up-vote)

dingo_d Over a year ago

I'm sorry I haven't gotten back to you, I couldn't get the Python to work on my comp back then so I forgot about this. I've installed it again, and I tried to put the 'valid' in convolve, and got the error ValueError: x and y must have same first dimension. I checked the length of my array and they were the same. I even did the x = numpy.array(data[:,0]) y = numpy.array(data[:,1]), but I still got the same error.

dreadsci · Accepted Answer · 2012-07-05 20:39:31Z

1

I think something like:

aves = [sum(data[i:i+6]) for i in range(0, len(data), 5)]

But I always have to double check the indices are doing what I expect. The range you want is (0, 5, 10, ...) and data[0:6] will give you data[0]...data[5]

ETA: oops, and you want ave rather than sum, of course. So actually using your code and the formula:

r = 5
x = data[:,0]
y1 = data[:,1]
y2 = [ave(y1[i-r:i+r]) for i in range(r, len(y1), 2*r)]
y = [y1, y2]

edited Jul 5, 2012 at 20:39

answered Jul 5, 2012 at 20:29

dreadsci

543 bronze badges

2 Comments

dreadsci Over a year ago

And anyway, y1 has len(y1) points and y2 has len(y1)/2r points so...you want to add them separately to the graph. Go with the convolve solutions instead!

dingo_d Over a year ago

Again, for y2 I get that they are [array[number, number], array[number, number]...] :\ I need to get numbers to plot :\

eestrada · Accepted Answer · 2015-12-23 23:50:25Z

1

My Moving Average function, without numpy function:

from __future__ import division  # must be on first line of script

class Solution:
    def Moving_Avg(self,A):
        m = A[0]
        B = []
        B.append(m)
        for i in range(1,len(A)):
            m = (m * i + A[i])/(i+1)
            B.append(m)
        return B

edited Dec 23, 2015 at 23:50

eestrada

1,60315 silver badges25 bronze badges

answered Dec 23, 2015 at 22:07

Armanda_An

112 bronze badges

2 Comments

Armanda_An Over a year ago

Sorry to add the first line: from future import division. Otherwise the output will be int instead of float

Bennett Brown Over a year ago

@Arnanda_An, You can force float division in Python 2 by using a decimal point in the 1: m = (m * i + A[i])/(i+1.)

Collectives™ on Stack Overflow

Finding moving average from data points in Python

7 Answers 7

4 Comments

6 Comments

2 Comments

Comments

2 Comments

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

6 Comments

2 Comments

Comments

2 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related