64

I am playing in Python a bit again, and I found a neat book with examples. One of the examples is to plot some data. I have a .txt file with two columns and I have the data. I plotted the data just fine, but in the exercise it says: Modify your program further to calculate and plot the running average of the data, defined by:

$Y_k=\frac{1}{2r}\sum_{m=-r}^r y_{k+m}$

where r=5 in this case (and the y_k is the second column in the data file). Have the program plot both the original data and the running average on the same graph.

So far I have this:

from pylab import plot, ylim, xlim, show, xlabel, ylabel
from numpy import linspace, loadtxt

data = loadtxt("sunspots.txt", float)
r=5.0

x = data[:,0]
y = data[:,1]

plot(x,y)
xlim(0,1000)
xlabel("Months since Jan 1749.")
ylabel("No. of Sun spots")
show()

So how do I calculate the sum? In Mathematica it's simple since it's symbolic manipulation (Sum[i, {i,0,10}] for example), but how to calculate sum in python which takes every ten points in the data and averages it, and does so until the end of points?

I looked at the book, but found nothing that would explain this :\


heltonbiker's code did the trick ^^ :D

from __future__ import division
from pylab import plot, ylim, xlim, show, xlabel, ylabel, grid
from numpy import linspace, loadtxt, ones, convolve
import numpy as numpy

data = loadtxt("sunspots.txt", float)

def movingaverage(interval, window_size):
    window= numpy.ones(int(window_size))/float(window_size)
    return numpy.convolve(interval, window, 'same')

x = data[:,0]
y = data[:,1]


plot(x,y,"k.")
y_av = movingaverage(y, 10)
plot(x, y_av,"r")
xlim(0,1000)
xlabel("Months since Jan 1749.")
ylabel("No. of Sun spots")
grid(True)
show()

And I got this:

image

Thank you very much ^^ :)

4
  • 1
    That's weird. Since we don't have your txt file, it's not possible to test here, but I think the xlim line should not be used (just in case) Commented Jul 5, 2012 at 21:11
  • I got the points from here: www-personal.umich.edu/~mejn/computational-physics/sunspots.dat And removing xlim didn't help :\ Commented Jul 5, 2012 at 21:14
  • 2
    I made a mistake in the code! you have to perform the average on the y array, not x: y_av = movingaverage(y, r) plot(x, y_av). And you can use xlim again, I think. Commented Jul 5, 2012 at 21:20
  • 1
    I think we need to use "valid" instead of "same" here - return numpy.convolve(interval, window, 'same') Commented Oct 29, 2014 at 4:12

7 Answers 7

116

As numpy.convolve is pretty slow, those who need a fast performing solution might prefer an easier to understand cumsum approach. Here is the code:

cumsum_vec = numpy.cumsum(numpy.insert(data, 0, 0)) 
ma_vec = (cumsum_vec[window_width:] - cumsum_vec[:-window_width]) / window_width

where data contains your data, and ma_vec will contain moving averages of window_width length.

On average, cumsum is about 30-40 times faster than convolve.

Sign up to request clarification or add additional context in comments.

4 Comments

where is the 'step' parameter?
This is a duplicate of this older question:stackoverflow.com/a/27681394/1391441
why the numpy.insert(data, 0, 0)? It adds a single 0 at the beginning of data, right?
The insert at the beginning only adds 0 yes. I'm not sure what exactly is its purpose. If you want to pad your array to keep a similar array size you may use this king of thing. Typically : data = np.pad( data, int(wsize/2) , mode='edge') will allow to duplicate the edge value and not get a lower value than originally being in your data. If wsize is odd adding a supplementary y[0] at the beginning of data is then useful.
92

Before reading this answer, bear in mind that there is another answer below, from Roman Kh, which uses numpy.cumsum and is MUCH MUCH FASTER than this one.


Best One common way to apply a moving/sliding average (or any other sliding window function) to a signal is by using numpy.convolve().

def movingaverage(interval, window_size):
    window = numpy.ones(int(window_size))/float(window_size)
    return numpy.convolve(interval, window, 'same')

Here, interval is your x array, and window_size is the number of samples to consider. The window will be centered on each sample, so it takes samples before and after the current sample in order to calculate the average. Your code would become:

plot(x,y)
xlim(0,1000)

x_av = movingaverage(interval, r)
plot(x_av, y)

xlabel("Months since Jan 1749.")
ylabel("No. of Sun spots")
show()

6 Comments

Here I get error: Traceback (most recent call last): File "C:/Users/*****/Desktop/sunspots_plot.py", line 18, in <module> x_av = movingaverage(x, 5) File "C:/Users/*****/Desktop/sunspots_plot.py", line 8, in movingaverage window= numpy.ones(int(window_size))/float(window_size) NameError: global name 'numpy' is not defined
Well, that means you didn't import numpy. In fact, you imported just some functions from it: linspace and loadtxt. You should add ones and convolve to that ;o)
I edited my code and now I have the image, but the average is only on last part of the graph, should I manually change interval to sort that out?
The problem is that convolve is extremely slow. Below you may find a much faster solution based on numpy.cumsum().
I'm finding that this solution works very well, but does not work at the edges of the data. It adds spurious low values.
|
32

A moving average is a convolution, and numpy will be faster than most pure python operations. This will give you the 10 point moving average.

import numpy as np
smoothed = np.convolve(data, np.ones(10)/10)

I would also strongly suggest using the great pandas package if you are working with timeseries data. There are some nice moving average operations built in.

2 Comments

I get Error: Traceback (most recent call last): File "C:/Users/*****/Desktop/sunspots_plot.py", line 7, in <module> smoothed = np.convolve(data, np.ones(10)/(10)) File "C:\Python26\lib\site-packages\numpy\core\numeric.py", line 787, in convolve return multiarray.correlate(a, v[::-1], mode) ValueError: object too deep for desired array
Thats b/c data in your case is a multiple dimension numpy array, and you should be passing a one dimension array. In your case, it would be smoothed = np.convolve(y, np.ones/10)
4
ravgs = [sum(data[i:i+5])/5. for i in range(len(data)-4)]

This isn't the most efficient approach but it will give your answer and I'm unclear if your window is 5 points or 10. If its 10, replace each 5 with 10 and the 4 with 9.

Comments

4

There is a problem with the accepted answer. I think we need to use "valid" instead of "same" here - return numpy.convolve(interval, window, 'same') .

As an Example try out the MA of this data-set = [1,5,7,2,6,7,8,2,2,7,8,3,7,3,7,3,15,6] - the result should be [4.2,5.4,6.0,5.0,5.0,5.2,5.4,4.4,5.4,5.6,5.6,4.6,7.0,6.8], but having "same" gives us an incorrect output of [2.6,3.0,4.2,5.4,6.0,5.0,5.0,5.2,5.4,4.4,5.4,5.6,5.6, 4.6,7.0,6.8,6.2,4.8]

Rusty code to try this out -:

result=[]
dataset=[1,5,7,2,6,7,8,2,2,7,8,3,7,3,7,3,15,6]
window_size=5
for index in xrange(len(dataset)):
    if index <=len(dataset)-window_size :
        tmp=(dataset[index]+ dataset[index+1]+ dataset[index+2]+ dataset[index+3]+ dataset[index+4])/5.0
        result.append(tmp)
    else:
      pass

result==movingaverage(y, window_size) 

Try this with valid & same and see whether the math makes sense.

See also -: http://sentdex.com/sentiment-analysisbig-data-and-python-tutorials-algorithmic-trading/how-to-chart-stocks-and-forex-doing-your-own-financial-charting/calculate-simple-moving-average-sma-python/

2 Comments

@dingo_d Why don't you quickly try this out with the rusty code (and the sample data-set(as a simple list), I posted ? For some lazy people(like I had been at first) - its masks out the fact that moving average is incorrect.Probably you should consider editing your original answer. I tried it just yesterday and double checking saved me face from looking bad at reporting to Cxo level. All you need to do, is to try your same moving average once with "valid" and other time with "same" - and once you are convinced give me some love(aka-up-vote)
I'm sorry I haven't gotten back to you, I couldn't get the Python to work on my comp back then so I forgot about this. I've installed it again, and I tried to put the 'valid' in convolve, and got the error ValueError: x and y must have same first dimension. I checked the length of my array and they were the same. I even did the x = numpy.array(data[:,0]) y = numpy.array(data[:,1]), but I still got the same error.
1

I think something like:

aves = [sum(data[i:i+6]) for i in range(0, len(data), 5)]

But I always have to double check the indices are doing what I expect. The range you want is (0, 5, 10, ...) and data[0:6] will give you data[0]...data[5]

ETA: oops, and you want ave rather than sum, of course. So actually using your code and the formula:

r = 5
x = data[:,0]
y1 = data[:,1]
y2 = [ave(y1[i-r:i+r]) for i in range(r, len(y1), 2*r)]
y = [y1, y2]

2 Comments

And anyway, y1 has len(y1) points and y2 has len(y1)/2r points so...you want to add them separately to the graph. Go with the convolve solutions instead!
Again, for y2 I get that they are [array[number, number], array[number, number]...] :\ I need to get numbers to plot :\
1

My Moving Average function, without numpy function:

from __future__ import division  # must be on first line of script

class Solution:
    def Moving_Avg(self,A):
        m = A[0]
        B = []
        B.append(m)
        for i in range(1,len(A)):
            m = (m * i + A[i])/(i+1)
            B.append(m)
        return B

2 Comments

Sorry to add the first line: from future import division. Otherwise the output will be int instead of float
@Arnanda_An, You can force float division in Python 2 by using a decimal point in the 1: m = (m * i + A[i])/(i+1.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.