3

I have a 256x256 matrix of values and I would like to plot a histogram of these values

If I am not mistaken, the histogram must be calculated in a vector of values, correct? so here is what I have tried:

from skimage.measure import compare_ssim
import numpy as np
import matplotlib.pyplot as plt

d = np.load("BB_Digital.npy")

n, bins, patches = plt.hist(x=d.ravel(), color='#0504aa', bins='auto', alpha=0.7, rwidth=0.85)

plt.grid(axis='y', alpha=0.75)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Blue channel Co-occurency matrix')
maxfreq = n.max()

# Set a clean upper y-axis limit.
plt.ylim(ymax=np.ceil(maxfreq / 10) * 10 if maxfreq % 10 else maxfreq + 10)
plt.show()

But then, I have a very strange result:

Vectorized matrix result

When I don't use the ravel function (use the 2D matrix) the following result is shown:

Histogram of 2D matrix

However, both histograms seem to be wrong, as I verified later:

>>> np.count_nonzero(d==0)
51227
>>> np.count_nonzero(d==1)
2529
>>> np.count_nonzero(d==2)
1275
>>> np.count_nonzero(d==3)
885
>>> np.count_nonzero(d==4)
619
>>> np.count_nonzero(d==5)
490
>>> np.count_nonzero(d==6)
403
>>> np.max(d)
12518
>>> np.min(d)
0

How can I build a correct histogram?

P.s: Here is the file if you could help me.

6
  • @Xbel the same result. Thanks for your help. Commented Oct 1, 2020 at 13:44
  • 2
    @Xbel it's just your data is too skewed. You have too many zeros in your data. Commented Oct 1, 2020 at 13:45
  • @Xbel even that way the histogram seems to be wrong, I don't have 60000 zero values :-( Commented Oct 1, 2020 at 13:47
  • 1
    You have 51k zeros. Remove the zero, your data is still skewed with 2.5k 1's, 1.28k 2's and so on. Btw, it looks like you are working with an image. You should also consider cv2. Commented Oct 1, 2020 at 13:49
  • 1
    Use log scale on both axis. But I think you better do a plt.imshow with colorbar. Commented Oct 1, 2020 at 13:56

1 Answer 1

4

The data seems to be discrete. Setting explicit bin boundaries at the halves could show the frequency of each value. As there are very high but infrequent values, the following example cuts off at 50:

import numpy as np
from matplotlib import pyplot as plt

d = np.load("BB_Digital.npy")

plt.hist(d.ravel(), bins=np.arange(-0.5, 51),  color='#0504aa', alpha=0.7, rwidth=0.85)
plt.yscale('log')
plt.margins(x=0.02)
plt.show()

example plot

Another visualization could show a pcolormesh where the colors use a logarithmic scale. As the values start at 0, adding 1 avoids minus infinity:

from matplotlib import pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np

d = np.load("BB_Digital.npy")
plt.pcolormesh(d + 1, norm=LogNorm(), cmap='inferno')
plt.colorbar()
plt.show()

pcolormesh

Yet another visualization concentrates on the diagonal values:

plt.plot(np.diagonal(d), color='navy')
ind_max = np.argmax(np.diagonal(d))
plt.vlines(ind_max, 0, d[ind_max, ind_max], colors='crimson', ls=':')
plt.yscale('log')

values on diagonal

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.