0

I need to group the array values into a range-based histogram in ruby...

values = [ 139, 145, 149, 151, 152, 153, 163, 166, 169 ]

for example:

141 - 145 = 2
146 - 150 = 1
151 - 155 = 3

...

Is there a simple way to use group_by?

2 Answers 2

3

Since you're dealing with a simple definition for each range, then yes:

values.group_by do |v|
  (v-1) / 5
end.values
# => [[139], [145, 149], [151, 152, 153], [163], [166, 169]]

Normally group_by includes the grouping element, too, but this can be ignored as in this case it's not useful.

You can transform this into the form you're looking for with this using Ranges:

values.group_by do |v|
  (v-1) / 5
end.map do |v, a|
  [ (v*5+1..v*5+5), a.length ]
end.to_h
# => {136..140=>1, 141..145=>1, 146..150=>1, 151..155=>3, 161..165=>1, 166..170=>2}
Sign up to request clarification or add additional context in comments.

Comments

0

To prepare a histogram one normally specifies the smallest value of the first range, the range size and the number of ranges. Some pre-processing of the data may be necessary to determine those values. For example, given

values = [139, 145, 149, 151, 152, 153, 164, 166, 169]
group_size = 5

we might compute the smallest value of the first group and the number of groups as follows:

smallest, largest = values.minmax
  #=> [139, 169] 
start = group_size*(smallest/group_size)
  #=> 135 
nbr_groups = ((largest-start+1)/group_size.to_f).ceil
  #=> 7 

We can now construct an array we can use to create the histogram.

def group_values(values, start, nbr_groups, group_size)
  groups = Array.new(nbr_groups) do |i|
    f = start + i * group_size
    { nbr: 0, range: f..f+group_size-1 }
  end    
  values.each_with_object(groups) { |v,arr|
    arr[(v-start)/group_size][:nbr] += 1 }
end

Let's try it (for the values of start and nbr_groups computed above).

freq = group_values(values, start, nbr_groups, group_size)
  #=> group_values(values, 135, 7, 5)
  #=> [{:nbr=>1, :range=>135..139},
  #    {:nbr=>0, :range=>140..144},
  #    {:nbr=>2, :range=>145..149},
  #    {:nbr=>3, :range=>150..154},
  #    {:nbr=>0, :range=>155..159},
  #    {:nbr=>1, :range=>160..164},
  #    {:nbr=>2, :range=>165..169}]

Note that

  • the value of :range for each element of the resulting array is provided for labeling the horizontal axis of the histogram.
  • I initialized the array groups so that groups containing no elements of values (for values 140-144 and 155-159) would be included in the array returned. Had I constructed that array on the fly it would not have contained the hashes for those two groups.
  • to establish the range of frequencies for the vertical axis of the histogram we might compute the following.

freq.map { |h| h[:nbr] }.minmax
  #=> [0, 3] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.