1

I'm trying to create a histogram from an array of numbers in the range [0,1].

Is there a way to use group_by to separate the array into N groups/bins by numeric interval (or some other fun Ruby one-liner)?

This is my current, boring, solution:

# values == array containing floating point numbers in the range [0,1]

n = 10

# EDITED from Array.new(n, 0) to Array.new(n, []), thanks emaillenin!
histogram = Array.new(n, [])
values.each do |val|
  histogram[(val * n).ceil - 1].push(val)
end
4
  • this code wouldn't work. histogram[anything] is a Fixnum and you cannot call push on it Commented Jul 25, 2014 at 17:43
  • @emaillenin oops you're correct! I meant that to be an array of arrays, my bad! Commented Jul 25, 2014 at 17:46
  • Should be Array.new(n) { [ ] } to avoid all slots referencing the same array. Commented Jul 25, 2014 at 17:50
  • Why does it have to be a one-liner? That constraint often results in code that is hard to read or understand. Commented Jul 25, 2014 at 19:32

2 Answers 2

3

Not sure what you're trying to do but maybe this helps?

values = [0.0, 0.1, 0.2, 0.3]
values.group_by { |v| (v * 10).ceil - 1 }

That returns a hash:

{-1=>[0.0], 0=>[0.1], 1=>[0.2], 2=>[0.3]}
Sign up to request clarification or add additional context in comments.

2 Comments

Keep in mind that this can yield up to 11 buckets when both 0.0 and 1.0 are present in addition to the other values (0.x).
group_by does this very cleanly.
1

This is one way to do it.

Code

def freq_by_bin(nbr_bins, *values)
  nbr_bins.times.to_a.product([0]).to_h.tap { |h|
    values.each { |v| h.update({ (v*nbr_bins).to_i=>1 }) { |_,o,_| o+1 } } }
end

Example

values =  [0.30, 0.25, 0.63, 0.94, 0.08, 0.94, 0.01,
           0.41, 0.28, 0.69, 0.61, 0.12, 0.66]
freq_by_bin(10, *values)
  #=> {0=>2, 1=>1, 2=>2, 3=>1, 4=>1,
  #    5=>0, 6=>4, 7=>0, 8=>0, 9=>2}

def histogram(nbr_bins, *values)
  h = freq_by_bin(nbr_bins, *values)
  puts "\nfreq"
  h.values.max.downto(0) do |n|
    print "%2d|" % n
    puts nbr_bins.times.with_object('   ') { |i,row|
           row << ((h[i]==n) ? ' X ' : '   ') }
  end
  puts "   __"+"___"*nbr_bins
  puts nbr_bins.times.each_with_object('      ') { |i,row| row << "%2d " % i }
end

histogram(10, *values)

freq
 4|                      X          
 3|                                 
 2|    X     X                    X 
 1|       X     X  X                
 0|                   X     X  X    
   ________________________________
       0  1  2  3  4  5  6  7  8  9 

Notes

  • There are several ways to construct the hash whose elements are bin=>freq. Using Enumerable#group_by, which you mentioned and @diego used is one. I've used the form of Hash#update (aka Hash#merge!) that takes a block.

  • I used Object#tap merely to avoid the need to create a temporary (non-block) variable for the initialized hash.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.