Speed up Matplotlib?

Question

I've read here that matplotlib is good at handling large data sets. I'm writing a data processing application and have embedded matplotlib plots into wx and have found matplotlib to be TERRIBLE at handling large amounts of data, both in terms of speed and in terms of memory. Does anyone know a way to speed up (reduce memory footprint of) matplotlib other than downsampling your inputs?

To illustrate how bad matplotlib is with memory consider this code:

import pylab
import numpy
a = numpy.arange(int(1e7)) # only 10,000,000 32-bit integers (~40 Mb in memory)
# watch your system memory now...
pylab.plot(a) # this uses over 230 ADDITIONAL Mb of memory

I've always downsampled. Why would you ever need to try to render 10M points on a graph? — Paul
– Paul, Commented Feb 12, 2011 at 4:34
matplotlib is slow. It is a known fact. For qt i use the guiqwt package, maybe there is something like it for wx too. — tillsten
– tillsten, Commented Feb 12, 2011 at 15:59
@paul I just wanted to make it easy for my users to explore the data graphically. i.e. when they zoom, I didn't want to have to resample again depending on their zoom bounds, they would see the actual data no matter how they zoomed/panned. — David Morton
– David Morton, Commented Feb 12, 2011 at 18:53
If it's feasible, try not plotting things with lines connecting them... plt.plot(a, 'b.') will be much faster than the default plt.plot(a, 'b-'). — Joe Kington
– Joe Kington, Commented Feb 12, 2011 at 20:23
@Joe Kington My tests do not show dots to be faster or less memory intensive than lines. :( — David Morton
– David Morton, Commented Feb 13, 2011 at 0:34

brandx · Accepted Answer · 2011-06-14 15:49:08Z

7

Downsampling is a good solution here -- plotting 10M points consumes a bunch of memory and time in matplotlib. If you know how much memory is acceptable, then you can downsample based on that amount. For example, let's say 1M points takes 23 additional MB of memory and you find it to be acceptable in terms of space and time, therefore you should downsample so that it's always below the 1M points:

if(len(a) > 1M):
   a = scipy.signal.decimate(a, int(len(a)/1M)+1)
pylab.plot(a)

Or something like the above snippet (the above may downsample too aggressively for your taste.)

answered Jun 14, 2011 at 15:49

brandx

1,05312 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

David Morton Over a year ago

A simple decimation is inadequate, and is what Matplotlib does internally so far as I can tell. The reason I don't simply want to decimate, is that you lose the extreme values in each decimation interval. If the signal were to have a sharp spike within an interval you wouldn't see it on the plot at all unless you were very lucky with the intervals. I wrote some code that does this more intelligently, taking the extreme values for each decimation interval instead of the value at the center of the interval (or edge). I'm accepting your answer though since this is in principal what I did.

danodonovan Over a year ago

David - if you solved this 'more intelligently' would you mind sharing? You can mark your own answers as 'solved' and may get a few up votes...

Eraldo P. · Accepted Answer · 2013-07-15 22:53:56Z

2

I'm often interested in the extreme values too so, before plotting large chunks of data, I proceed in this way:

import numpy as np

s = np.random.normal(size=(1e7,))
decimation_factor = 10 
s = np.max(s.reshape(-1,decimation_factor),axis=1)

# To check the final size
s.shape

Of course np.max is just an example of extreme calculation function.

P.S. With numpy "strides tricks" it should be possible to avoid copying data around during reshape.

answered Jul 15, 2013 at 22:53

Eraldo P.

535 bronze badges

Comments

Marvin Thielk · Accepted Answer · 2019-05-16 02:14:24Z

I was interested in preserving one side of a log sampled plot so I came up with this: (downsample being my first attempt)

def downsample(x, y, target_length=1000, preserve_ends=0):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    data = np.vstack((x, y))
    if preserve_ends > 0:
        l, data, r = np.split(data, (preserve_ends, -preserve_ends), axis=1)
    interval = int(data.shape[1] / target_length) + 1
    data = data[:, ::interval]
    if preserve_ends > 0:
        data = np.concatenate([l, data, r], axis=1)
    return data[0, :], data[1, :]

def geom_ind(stop, num=50):
    geo_num = num
    ind = np.geomspace(1, stop, dtype=int, num=geo_num)
    while len(set(ind)) < num - 1:
        geo_num += 1
        ind = np.geomspace(1, stop, dtype=int, num=geo_num)
    return np.sort(list(set(ind) | {0}))

def log_downsample(x, y, target_length=1000, flip=False):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    data = np.vstack((x, y))
    if flip:
        data = np.fliplr(data)
    data = data[:, geom_ind(data.shape[1], num=target_length)]
    if flip:
        data = np.fliplr(data)
    return data[0, :], data[1, :]

which allowed me to better preserve one side of plot:

newx, newy = downsample(x, y, target_length=1000, preserve_ends=50)
newlogx, newlogy = log_downsample(x, y, target_length=1000)
f = plt.figure()
plt.gca().set_yscale("log")
plt.step(x, y, label="original")
plt.step(newx, newy, label="downsample")
plt.step(newlogx, newlogy, label="log_downsample")
plt.legend()

Collectives™ on Stack Overflow

Speed up Matplotlib?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related