Matplotlib.pyplot.hist() very slow

Question

I'm plotting about 10,000 items in an array. They are of around 1,000 unique values.

The plotting has been running half an hour now. I made sure rest of the code works.

Is it that slow? This is my first time plotting histograms with pyplot.

Yes, I would say that is very slow. In reality it depends on how many bins you selected, but i.e. for a 1000 bins I can plot 10 000 random generated values in about a second or two. Python 2, laptop core Intel i5 os Ubuntu 14.04. Show some code, it'll make things easier. — ljetibo
– ljetibo, Commented Mar 2, 2016 at 4:45
Actually I solved it by just reducing number of bins. Thanks though. — Fenwick
– Fenwick, Commented Mar 2, 2016 at 4:50
Are you sure you're using the correct column data type? I was using strings instead of integers and that was a sheer error on my part. — piedpiper
– piedpiper, Commented Aug 1, 2019 at 8:57

user545424 · Accepted Answer · 2016-09-19 21:20:38Z

31

To plot histograms using matplotlib quickly you need to pass the histtype='step' argument to pyplot.hist. For example:

plt.hist(np.random.exponential(size=1000000,bins=10000))
plt.show()

takes ~15 seconds to draw and roughly 5-10 seconds to update when you pan or zoom.

In contrast, plotting with histtype='step':

plt.hist(np.random.exponential(size=1000000),bins=10000,histtype='step')
plt.show()

plots almost immediately and can be panned and zoomed with no delay.

answered Sep 19, 2016 at 21:20

user545424

16.3k11 gold badges61 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

demented hedgehog Over a year ago

This is much faster as you say (I'm seeing the same times as you). But the graphs look very different with histtype='step'.

user545424 Over a year ago

@dementedhedgehog yes, they do. I guess it depends on which discipline you are in. In high energy physics the step style is the norm. I opened an issue on the matplotlib page to discuss the issue here a while ago: github.com/matplotlib/matplotlib/issues/7121.

CcMango · Accepted Answer · 2019-08-28 07:39:29Z

19

It will be instant to plot the histogram after flattening the numpy array. Try the below demo code:

import numpy as np

array2d = np.random.random_sample((512,512))*100
plt.hist(array2d.flatten())
plt.hist(array2d.flatten(), bins=1000)

answered Aug 28, 2019 at 7:39

CcMango

4571 gold badge5 silver badges16 bronze badges

5 Comments

Jarom Over a year ago

Was having this same issue, this solution worked like a charm.

Eilon Baer Over a year ago

This should be the accepted answer. Handled 100k values instantly as opposed to it not returning otherwise. If plotting multiple histograms, array2d.flatten() does cause the histograms to be plotted as one. Resolution is to add each column separately.

jbcd13 Over a year ago

This should the accepted answer. Far, far superior to the ones more upvoted

Khoa LT Over a year ago

This is fantastic! But I wonder why flattening the array would have such a huge improvement in executing time like that?

palapapa Apr 13 at 19:26

It's very weird that this works considering that Matplotlib uses numpy.histogram underneath hist, which already calculates the histogram with the array flattened.

Niko Fohr · Accepted Answer · 2018-03-16 14:07:50Z

7

Importing seaborn somewhere in the code may cause pyplot.hist to take a really long time.

If the problem is seaborn, it can be solved by resetting the matplotlib settings:

import seaborn as sns
sns.reset_orig()

answered Mar 16, 2018 at 14:07

Niko Fohr

35.3k12 gold badges113 silver badges117 bronze badges

Comments

Trenton McKinney · Accepted Answer · 2021-11-28 01:35:44Z

3

For me, the problem is that the data type of pd.Series, say S, is 'object' rather than 'float64'. After I use S = np.float64(S), then plt.hist(S) is very quick.

edited Nov 28, 2021 at 1:35

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

answered Jul 4, 2019 at 0:38

Napoléon

3213 silver badges7 bronze badges

1 Comment

Trenton McKinney Over a year ago

The correct way to change the type of a pandas.Series is with .astype(): S.astype('float64')

Skippy le Grand Gourou · Accepted Answer · 2021-03-03 17:25:31Z

2

Since several answers already mention the issue of slowness with pandas.hist(), note that it may be due to dealing with non-numerical data. An issue easily solved by using value_counts() :

df['colour'].value_counts().plot(kind='bar')

credits

answered Mar 3, 2021 at 17:25

Skippy le Grand Gourou

7,8626 gold badges66 silver badges82 bronze badges

Comments

Nic Scozzaro · Accepted Answer · 2020-07-01 16:16:14Z

1

I was facing the same problem using Pandas .hist() method. For me the solution was:

pd.to_numeric(df['your_data']).hist()

Which worked instantly.

answered Jul 1, 2020 at 16:16

Nic Scozzaro

7,4733 gold badges47 silver badges49 bronze badges

Comments

Yuri Feldman · Accepted Answer · 2019-11-05 08:38:33Z

0

For me it took calling figure.canvas.draw() after the call to hist to update immediately, i.e. hist was actually fast (discovered that after timing it), but there was a delay of a few seconds before figure was updated. I was calling hist inside a matplotlib callback in a jupyter lab cell (qt5 backend).

answered Nov 5, 2019 at 8:38

Yuri Feldman

2,6443 gold badges24 silver badges26 bronze badges

Comments

Oded Ben Dov · Accepted Answer · 2020-02-26 09:26:35Z

0

Anyone running into the issue I had - (which is totally my bad :) )

If you're dealing with numbers, make sure when reading from CSV that your datatype is int/float, and not string.

values_arr = .... .flatten().astype('float')

answered Feb 26, 2020 at 9:26

Oded Ben Dov

10.5k7 gold badges44 silver badges62 bronze badges

Comments

Shengge Yang · Accepted Answer · 2020-03-19 05:11:27Z

0

If you are working with pandas, make sure the data you passed in plt.hist() is a 1-d series rather than a dataframe. This helped me out.

answered Mar 19, 2020 at 5:11

Shengge Yang

1

Comments

pkumar90 · Accepted Answer · 2024-10-04 05:53:59Z

0

For my PyCharm on Windows Machine, putting the following snippet of code, before the histplot call (sns) in my case, worked:

import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Qt5Agg')

answered Oct 4, 2024 at 5:53

pkumar90

714 bronze badges

Collectives™ on Stack Overflow

Matplotlib.pyplot.hist() very slow

10 Answers 10

2 Comments

5 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

2 Comments

5 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related