How to build a histogram of numpy 2 dimensional array

Question

I have a 256x256 matrix of values and I would like to plot a histogram of these values

If I am not mistaken, the histogram must be calculated in a vector of values, correct? so here is what I have tried:

from skimage.measure import compare_ssim
import numpy as np
import matplotlib.pyplot as plt

d = np.load("BB_Digital.npy")

n, bins, patches = plt.hist(x=d.ravel(), color='#0504aa', bins='auto', alpha=0.7, rwidth=0.85)

plt.grid(axis='y', alpha=0.75)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Blue channel Co-occurency matrix')
maxfreq = n.max()

# Set a clean upper y-axis limit.
plt.ylim(ymax=np.ceil(maxfreq / 10) * 10 if maxfreq % 10 else maxfreq + 10)
plt.show()

But then, I have a very strange result:

When I don't use the ravel function (use the 2D matrix) the following result is shown:

However, both histograms seem to be wrong, as I verified later:

>>> np.count_nonzero(d==0)
51227
>>> np.count_nonzero(d==1)
2529
>>> np.count_nonzero(d==2)
1275
>>> np.count_nonzero(d==3)
885
>>> np.count_nonzero(d==4)
619
>>> np.count_nonzero(d==5)
490
>>> np.count_nonzero(d==6)
403
>>> np.max(d)
12518
>>> np.min(d)
0

How can I build a correct histogram?

P.s: Here is the file if you could help me.

@Xbel it's just your data is too skewed. You have too many zeros in your data. — Quang Hoang
– Quang Hoang, Commented Oct 1, 2020 at 13:45
@Xbel even that way the histogram seems to be wrong, I don't have 60000 zero values :-( — mad
– mad, Commented Oct 1, 2020 at 13:47
You have 51k zeros. Remove the zero, your data is still skewed with 2.5k 1's, 1.28k 2's and so on. Btw, it looks like you are working with an image. You should also consider cv2. — Quang Hoang
– Quang Hoang, Commented Oct 1, 2020 at 13:49
Use log scale on both axis. But I think you better do a plt.imshow with colorbar. — Quang Hoang
– Quang Hoang, Commented Oct 1, 2020 at 13:56

JohanC · Accepted Answer · 2020-10-01 21:32:31Z

The data seems to be discrete. Setting explicit bin boundaries at the halves could show the frequency of each value. As there are very high but infrequent values, the following example cuts off at 50:

import numpy as np
from matplotlib import pyplot as plt

d = np.load("BB_Digital.npy")

plt.hist(d.ravel(), bins=np.arange(-0.5, 51),  color='#0504aa', alpha=0.7, rwidth=0.85)
plt.yscale('log')
plt.margins(x=0.02)
plt.show()

Another visualization could show a pcolormesh where the colors use a logarithmic scale. As the values start at 0, adding 1 avoids minus infinity:

from matplotlib import pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np

d = np.load("BB_Digital.npy")
plt.pcolormesh(d + 1, norm=LogNorm(), cmap='inferno')
plt.colorbar()
plt.show()

Yet another visualization concentrates on the diagonal values:

plt.plot(np.diagonal(d), color='navy')
ind_max = np.argmax(np.diagonal(d))
plt.vlines(ind_max, 0, d[ind_max, ind_max], colors='crimson', ls=':')
plt.yscale('log')

Collectives™ on Stack Overflow

How to build a histogram of numpy 2 dimensional array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related