1

I am trying to create a 2D histrogram from a Pandas data frame "rates" The X and Y axis are supposed to be transforms from the dataframe, i.e., the X and Y axis are 'scaled' from the original frame columns and the bin heigths are according to the number of hits in each x/y bin.

import numpy, pylab, pandas
import matplotlib.pyplot as plt

list(rates.columns.values)
['sizes', 'transfers', 'positioning']

x=(rates["sizes"]/1024./1024.)
y=((rates["sizes"]/rates["transfers"])/1024.)+rates["positioning]

so, I try to feed them into a numpy 2D histogram with

histo, xedges, yedges = numpy.histogram2d(x, y, bins=(100,100))

However, this fails with

File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/numpy/lib/twodim_base.py", line 650, in histogram2d
 hist, edges = histogramdd([x, y], bins, range, normed, weights)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py" line 363, in histogramdd
 decimal = int(-log10(mindiff)) + 6
ValueError: cannot convert float NaN to integer

I have already dropped all NaN in my rame 'rates.dropna()' - but actually from the error I guess, that it is not due to NaNs in my frame.

Maybe somebody has an idea, what goes wrong here?

2
  • 3
    This might be difficult to answer if we don't know what data you're using as input. Could you find the smallest possible arrays that give the error, along with the version of numpy that you're using? Commented Jun 23, 2015 at 16:53
  • Hi, probably I have to ask a stupid question first how to get a subset plottet: I tried to plot a slice, e.g., H, xedges, yedges = np.histogram2d(x[1:1000], y[1:1000], bins=(10,10)) but got an attribute error AttributeError: The dimension of bins must be equal to the dimension of the sample x. I would have assumed, that the dimensions should be equal, or?? My versions are numpy:1.8.2 and pandas:0.15.2, Commented Jun 24, 2015 at 7:30

1 Answer 1

1

with help from @jme I got on the right track

I had not checked for a problematic value pair x:y = 0.0:inf can obviously not be a good 2D histogram vector, i.e., when transforming the original values I have to catch such cases.

another thing: numpy histogram had some issues for me with DataFrame series, so I had to get a proper numpy.arrary from the series to plot them properly, e.g.,

histo, xedges, yedges = np.histogram2d(np.array(x[1:MAX]),np.array(y[1:MAX]), bins=(100,100))

for slicing the series up to some variable MAX

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.