5

I have a massive data set where I need to split my plot into a grid and count the number of points within each grid square. I'm following a method outlined here:

with a stripped-down version of my code below:

import numpy as np
import matplotlib.pyplot as plt

x = [ 1.83259571, 1.76278254, 1.38753676, 1.6406095, 1.34390352, 1.23045712, 1.85877565, 1.26536371, 0.97738438]

y = [ 0.04363323, 0.05235988, 0.09599311, 0.10471976, 0.1134464, 0.13962634, 0.17453293, 0.20943951, 0.23561945]

gridx = np.linspace(min(x),max(x),11)
gridy = np.linspace(min(y),max(y),11)

grid, _, _ = np.histogram2d(x, y, bins=[gridx, gridy])

plt.figure()
plt.plot(x, y, 'ro')
plt.grid(True)

plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.plot(x, y, 'ro')
plt.colorbar()

plt.show()

Where the problem arises is the grid is identifying elements of the plot as where points are appearing yet there are no points within some of those elements; similarly, where some of the actual data points appear the grid does not recognize them as not actually being there.

What might be causing this problem? Also, sorry for not attaching the plot, I'm a new user and my reputation isn't high enough.

UPDATE Here's a code that generates 100 random points and attempts to plot them in a 2-D histogram:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.rand(100)

y = np.random.rand(100)

gridx = np.linspace(0,1,11)
gridy = np.linspace(0,1,11)

grid, __, __ = np.histogram2d(x, y, bins=[gridx, gridy])

plt.figure()
plt.plot(x, y, 'ro')
plt.grid(True)

plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.plot(x, y, 'ro')
plt.colorbar()

plt.show()

Yet when I run it I have the same problem as before: the locations of the points and the colors corresponding to point-location-density don't agree. Does this happen when anyone runs this code for themselves?

SECOND UPDATE

And at the risk of beating a dead horse, here's a code for a parametric plot:

import numpy as np
import matplotlib.pyplot as plt

t = np.linspace(0,1,100)
x = np.sin(t)
y = np.cos(t)

gridx = np.linspace(0,1,11)
gridy = np.linspace(0,1,11)

#grid, __, __ = np.histogram2d(x, y, bins=[gridx, gridy])
grid, __, __ = np.histogram2d(x, y)

plt.figure()
plt.plot(x, y, 'ro')
plt.grid(True)

plt.figure()
plt.pcolormesh(gridx, gridy, grid)
plt.plot(x, y, 'ro')
plt.colorbar()

plt.show()

which makes me think this is all some kind of weird scaling issue. Still totally lost though...

2
  • the reason the above does not work maybe due to the lack of data points in your data. you seem to have only 9 data points for x and y. Whereas the example you follow has 100 data points, try the same example with 9 points and it does not work!! Commented Jun 2, 2014 at 21:13
  • Could np.histogram2d have a problem with randomly scattered small numbers? I tried what you suggested with 100 points but it still didn't work. Strangely enough, when I tried a test case of x and y equal to linspace(0,1,100), the colormesh function worked perfectly. Commented Jun 2, 2014 at 22:02

2 Answers 2

2

I was able to get your example to work by using imshow with interpolation instead of pcolormesh. See sample code below.

I think the problem may be that pcolormesh has a different origin convention than plot. The results of pcolormesh look like the upper left and lower right are flipped.

The result with imshow looks like:

imshow result

The sample code:

import numpy as np
import matplotlib.pyplot as plt

def doPlot():

    x = [ 1.83259571, 1.76278254, 1.38753676, 1.6406095, 1.34390352, 1.23045712, 1.85877565, 1.26536371, 0.97738438]

    y = [ 0.04363323, 0.05235988, 0.09599311, 0.10471976, 0.1134464, 0.13962634, 0.17453293, 0.20943951, 0.23561945]

    gridx = np.linspace(min(x),max(x),11)
    gridy = np.linspace(min(y),max(y),11)

    H, xedges, yedges = np.histogram2d(x, y, bins=[gridx, gridy])

    plt.figure()
    plt.plot(x, y, 'ro')
    plt.grid(True)

    #wrong origin convention for pcolormesh?
    #plt.figure()
    #plt.pcolormesh(gridx, gridy, H)
    #plt.plot(x, y, 'ro')
    #plt.colorbar()


    plt.figure()
    myextent  =[xedges[0],xedges[-1],yedges[0],yedges[-1]]
    plt.imshow(H.T,origin='low',extent=myextent,interpolation='nearest',aspect='auto')
    plt.plot(x,y,'ro')
    plt.colorbar()

    plt.show()

if __name__=="__main__":
    doPlot()
Sign up to request clarification or add additional context in comments.

Comments

0

Referencing numpy histogram2d's documentation...

The careful reader will note that the parameters are backwards.

histogram2d(y, x, bins=(xedges, yedges)

Compute the bi-dimensional histogram of two data samples.

Parameters

x : array_like, shape (N,) An array containing the x coordinates of the points to be histogrammed.

y : array_like, shape (N,) An array containing the y coordinates of the points to be histogrammed.

Ergo, you supplied your x's to the y parameter of the function and vice versa for the x's.

Regards

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.