I need to group the array values into a range-based histogram in ruby...
values = [ 139, 145, 149, 151, 152, 153, 163, 166, 169 ]
for example:
141 - 145 = 2
146 - 150 = 1
151 - 155 = 3
...
Is there a simple way to use group_by?
Since you're dealing with a simple definition for each range, then yes:
values.group_by do |v|
(v-1) / 5
end.values
# => [[139], [145, 149], [151, 152, 153], [163], [166, 169]]
Normally group_by includes the grouping element, too, but this can be ignored as in this case it's not useful.
You can transform this into the form you're looking for with this using Ranges:
values.group_by do |v|
(v-1) / 5
end.map do |v, a|
[ (v*5+1..v*5+5), a.length ]
end.to_h
# => {136..140=>1, 141..145=>1, 146..150=>1, 151..155=>3, 161..165=>1, 166..170=>2}
To prepare a histogram one normally specifies the smallest value of the first range, the range size and the number of ranges. Some pre-processing of the data may be necessary to determine those values. For example, given
values = [139, 145, 149, 151, 152, 153, 164, 166, 169]
group_size = 5
we might compute the smallest value of the first group and the number of groups as follows:
smallest, largest = values.minmax
#=> [139, 169]
start = group_size*(smallest/group_size)
#=> 135
nbr_groups = ((largest-start+1)/group_size.to_f).ceil
#=> 7
We can now construct an array we can use to create the histogram.
def group_values(values, start, nbr_groups, group_size)
groups = Array.new(nbr_groups) do |i|
f = start + i * group_size
{ nbr: 0, range: f..f+group_size-1 }
end
values.each_with_object(groups) { |v,arr|
arr[(v-start)/group_size][:nbr] += 1 }
end
Let's try it (for the values of start and nbr_groups computed above).
freq = group_values(values, start, nbr_groups, group_size)
#=> group_values(values, 135, 7, 5)
#=> [{:nbr=>1, :range=>135..139},
# {:nbr=>0, :range=>140..144},
# {:nbr=>2, :range=>145..149},
# {:nbr=>3, :range=>150..154},
# {:nbr=>0, :range=>155..159},
# {:nbr=>1, :range=>160..164},
# {:nbr=>2, :range=>165..169}]
Note that
:range for each element of the resulting array is provided for labeling the horizontal axis of the histogram.groups so that groups containing no elements of values (for values 140-144 and 155-159) would be included in the array returned. Had I constructed that array on the fly it would not have contained the hashes for those two groups.freq.map { |h| h[:nbr] }.minmax
#=> [0, 3]