7

I want to draw a histogram and a line plot at the same graph. However, to do that I need to have my histogram as a probability mass function, so I want to have on the y-axis a probability values. However, I don't know how to do that, because using the normed option didn't helped. Below is my source code and a sneak peek of used data. I would be very grateful for all suggestions.

data = [12565, 1342, 5913, 303, 3464, 4504, 5000, 840, 1247, 831, 2771, 4005, 1000, 1580, 7163, 866, 1732, 3361, 2599, 4006, 3583, 1222, 2676, 1401, 2598, 697, 4078, 5016, 1250, 7083, 3378, 600, 1221, 2511, 9244, 1732, 2295, 469, 4583, 1733, 1364, 2430, 540, 2599, 12254, 2500, 6056, 833, 1600, 5317, 8333, 2598, 950, 6086, 4000, 2840, 4851, 6150, 8917, 1108, 2234, 1383, 2174, 2376, 1729, 714, 3800, 1020, 3457, 1246, 7200, 4001, 1211, 1076, 1320, 2078, 4504, 600, 1905, 2765, 2635, 1426, 1430, 1387, 540, 800, 6500, 931, 3792, 2598, 5033, 1040, 1300, 1648, 2200, 2025, 2201, 2074, 8737, 324]
plt.style.use('ggplot')
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)
plt.xlabel("Incomes")
plt.hist(data, bins=50, color="blue", alpha=0.5, normed=True)
plt.show() 
3
  • What do you mean by the normed option didn't help? And what exactly is your question? How to normalize the distribution? Or how to plot a line over a histogram? Commented Jun 17, 2015 at 11:16
  • @hitzig. My question is exactly what I wrote: "I want to have on the y-axis a probability values. " And the normed option following the documentation doesn't guarantee that the values on the y-axis describe probabilities (don't add up to 1). Commented Jun 17, 2015 at 11:36
  • normed is depricated for hist(). use the density keyword argument instead. Commented Nov 3, 2020 at 20:21

2 Answers 2

11

As far as I know, matplotlib does not have this function built-in. However, it is easy enough to replicate

    import numpy as np
    heights,bins = np.histogram(data,bins=50)
    heights = heights/sum(heights)
    plt.bar(bins[:-1],heights,width=(max(bins) - min(bins))/len(bins), color="blue", alpha=0.5)

Edit: Here is another approach from a similar question:

     weights = np.ones_like(data)/len(data)
     plt.hist(data, bins=50, weights=weights, color="blue", alpha=0.5, normed=False) 
Sign up to request clarification or add additional context in comments.

6 Comments

When you pass normed=True it does exactly that: values = values / sum(values)
No it doesn't, it makes a probability density function so that the bin size multiplied by the height sums to one. See, eg stackoverflow.com/questions/3866520/…
Looking at the source it sure looks like it takes values per bin and divides it by sum of all values, doesn't it?
m = (m.astype(float) / db) / m.sum() is the relevant line. That db makes all the difference, it makes the integral f(x)dx sum to one, approximating a continuous distribution. Op wants f(x) to sum to one, approximating a discrete distribution. If bin sizes are equal to 1, the definitions coincide. Otherwise, you need to do something like my answer. Look up probability mass function vs density function for more details.
@mmdanziger Thank you for your answer! The first solution works very well and is very helpful. But of course, I will also check the second suggestion. I just added additional 'float' during the division, because I got zeros instead of float values.
|
1

This is old, but since I found it and was about to use it before I noticed some mistakes, I figured I'd add a comment for a couple of fixes I noticed. In the example @mmdanziger uses the bin edges in plt.bar, however, you need to actually use the centers of the bin. Also they assume that the bins are of equal width, which is fine "most" of the time. But you can also pass it an array of widths, which keep you from inadvertently forgetting and making a mistake. So here's a more complete example:

import numpy as np
heights, bins = np.histogram(data, bins=50)
heights = heights/sum(heights)
bin_centers = 0.5*(bins[1:] + bins[:-1])
bin_widths = np.diff(bins)
plt.bar(bin_centers, heights, width=bin_widths, color="blue", alpha=0.5)

@mmdanziger other option of passing weights = np.ones_like(data)/len(data) to plt.hist() also does the same thing, and for many is an easier approach.

1 Comment

Can you please tell me what's the purpose of using bin centers?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.