16

I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150)

The data I have looks like this:

0.000
0.005
0.124
0.000
0.004
0.000
0.111
0.112

Whith my code below I expect to get result that looks like

[0, 0.005) 5
[0.005, 0.011) 0
...etc.. 

I tried to do do such binning with this code of mine. But it doesn't seem to work. What's the right way to do it?

#! /usr/bin/env python


import fileinput, math

log2 = math.log(2)

def getBin(x):
    return int(math.log(x+1)/log2)

diffCounts = [0] * 5

for line in fileinput.input():
    words = line.split()
    diff = float(words[0]) * 1000;

    diffCounts[ str(getBin(diff)) ] += 1

maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)


for i in range(maxBin+1):
     lo = 2**i - 1
     hi = 2**(i+1) - 1
     binStr = '[' + str(lo) + ',' + str(hi) + ')'
     print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

~

3
  • Well, in the example "what you expect...", if you have ranges defined as [0, 0.005) (right open) and [0.005, 0.011) (closed left) then the output should be: [0, 0.005) 4 [0.005, 0.011) 1 etc... Commented Nov 12, 2009 at 10:56
  • "Doesn't seem to work?" Any specific complaint? Or do you expect everyone have to run it and try to guess what you don't like about the output? Commented Nov 12, 2009 at 10:58
  • To avoid re-inventing the wheel, especially if the next step is plotting your histogram: you should consider using the Matplotlib framework which handles all that. Commented Nov 12, 2009 at 12:13

3 Answers 3

20

When possible, don't reinvent the wheel. NumPy has everything you need:

#!/usr/bin/env python
import numpy as np

a = np.fromfile(open('file', 'r'), sep='\n')
# [ 0.     0.005  0.124  0.     0.004  0.     0.111  0.112]

# You can set arbitrary bin edges:
bins = [0, 0.150]
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [8]
# bin_edges: [ 0.    0.15]

# Or, if bin is an integer, you can set the number of bins:
bins = 4
hist, bin_edges = np.histogram(a, bins=bins)
# hist: [5 0 0 3]
# bin_edges: [ 0.     0.031  0.062  0.093  0.124]
Sign up to request clarification or add additional context in comments.

2 Comments

And if you want a normalized histogram, you can add the line: hist = hist*1.0/sum(hist)
And if you want the integral over the bin range to be 1, use density=True.
4
from pylab import *
data = []
inf = open('pulse_data.txt')
for line in inf:
    data.append(float(line))
inf.close()
#binning
B = 50
minv = min(data)
maxv = max(data)
bincounts = []
for i in range(B+1):
    bincounts.append(0)
for d in data:
    b = int((d - minv) / (maxv - minv) * B)
    bincounts[b] += 1
# plot histogram

plot(bincounts,'o')
show()

Comments

3

The first error is:

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ str(getBin(diff)) ] += 1
TypeError: list indices must be integers

Why are you converting an int to a str when a str is needed? Fix that, then we get:

Traceback (most recent call last):
  File "C:\foo\foo.py", line 17, in <module>
    diffCounts[ getBin(diff) ] += 1
IndexError: list index out of range

because you've only made 5 buckets. I don't understand your bucketing scheme, but let's make it 50 buckets and see what happens:

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 21, in <module>
    maxBin = max(maxdiff)
TypeError: 'int' object is not iterable

maxdiff is a single value out of your list of ints, so what is max doing here? Remove it, now we get:

6
Traceback (most recent call last):
  File "C:\foo\foo.py", line 28, in <module>
    print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
TypeError: argument 2 to map() must support iteration

Sure enough, you're using a single value as the second argument to map. Let's simplify the last two lines from this:

 binStr = '[' + str(lo) + ',' + str(hi) + ')'
 print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))

to this:

 print "[%f, %f)\t%r" % (lo, hi, diffCounts[i])

Now it prints:

6
[0.000000, 1.000000)    3
[1.000000, 3.000000)    0
[3.000000, 7.000000)    2
[7.000000, 15.000000)   0
[15.000000, 31.000000)  0
[31.000000, 63.000000)  0
[63.000000, 127.000000) 3

I'm not sure what else to do here, since I don't really understand the bucketing you are hoping to use. It seems to involve binary powers, but isn't making sense to me...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.