3

I have a dataset similar to the following:

[
  {:option_id => 10, :option_style_ids => [9, 10, 11]},
  {:option_id => 7, :option_style_ids => [19]},
  {:option_id => 8, :option_style_ids => [1]},
  {:option_id => 5, :option_style_ids => [4, 5]},
  {:option_id => 10, :option_style_ids => [9, 10, 11]},
  {:option_id => 7, :option_style_ids => [19]},
  {:option_id => 5, :option_style_ids => [4, 5]},
  {:option_id => 8, :option_style_ids => [1]},
  {:option_id => 12, :option_style_ids => [20]},
  {:option_id => 5, :option_style_ids => [2, 5]}
]

I would like to merge the dataset for an output of:

[
  {:option_id => 10, :option_style_ids => [9, 10, 11]},
  {:option_id => 7, :option_style_ids => [19]},
  {:option_id => 8, :option_style_ids => [1]},
  {:option_id => 5, :option_style_ids => [2, 4, 5]},
  {:option_id => 12, :option_style_ids => [20]}
]

The above output strips the duplicates, however, for the option_id: 5 hashes, I need it to combines the option_style_ids array values (some of which are different).

I tried:

r.group_by{|h| h[:option_id]}.map{|k,v| v.reduce(:merge)}

Unfortunately, that did not combine the option_style_ids array values.

3
  • Don't forget to select an answer if you find any to be helpful. Commented Aug 13, 2015 at 18:47
  • I chose @sawa 's answer because it was the most concise Commented Aug 18, 2015 at 18:22
  • Just in case you didn't know, you can upvote answers. I mention this because I noticed you didn't upvote the answer you selected. Commented Aug 19, 2015 at 1:42

3 Answers 3

2

This can be done using Hash#update (aka Hash#merge!), using the form that employs a block to determine that values of keys that are present in both hashes being merged.

Code

def merge_em(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g[:option_id]=>g) do |_,o,n|
      { :option_id=>o[:option_id],
        :option_style_ids=>o[:option_style_ids] | n[:option_style_ids] }
    end
  end.values
end

Example

For the array given in the question, which I'll refer to as arr:

merge_em(arr) 
  #=> [{:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #    {:option_id=> 7, :option_style_ids=>[19]},
  #    {:option_id=> 8, :option_style_ids=>[1]},
  #    {:option_id=> 5, :option_style_ids=>[4, 5, 2]},
  #    {:option_id=>12, :option_style_ids=>[20]}] 

Explanation

To explain what's going on, let me simplify arr:

arr = [
  { :option_id => 10, :option_style_ids => [9, 10, 11] },
  { :option_id =>  7, :option_style_ids => [19] },
  { :option_id => 10, :option_style_ids => [9, 12] }
]

The steps:

enum = arr.each_with_object({})
  #=> #<Enumerator: [
  #     {:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #     {:option_id=> 7, :option_style_ids=>[19]},
  #     {:option_id=>10, :option_style_ids=>[9, 12]}
  #   ]:each_with_object({})>  

We can view the elements of enum by converting it to an array:

enum.to_a
  #=> [[{:option_id=>10, :option_style_ids=>[9, 10, 11]}, {}],
  #    [{:option_id=> 7, :option_style_ids=>[19]}, {}],
  #    [{:option_id=>10, :option_style_ids=>[9, 12]}, {}]]

As you see, enum contains three elements.

The first element of enum is passed to the block and assigned to the block variables:

g,h = enum.next
  #=> [{:option_id=>10, :option_style_ids=>[9, 10, 11]}, {}] 
g #=> {:option_id=>10, :option_style_ids=>[9, 10, 11]} 
h #=> {} 

We now perform the block calculation:

h.update(g[:option_id]=>g)
  #=> {}.update(10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}
  #   {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}} 

update returns the new value of h.

In merging { 10=>g } into h (Ruby permits the shorthand (10=>g) for this), h does not have a key 10, so update's block is not consulted in determining the merged value for h[10].

The next element of enum is passed to the block:

g,h = enum.next
  #=> [{:option_id=>7, :option_style_ids=>[19]},
  #=>  {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}}] 
g #=>  {:option_id=>7, :option_style_ids=>[19]} 
h #=>  {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}} 

Notice that h has been updated.

We now perform the block calculation:

h.update(g[:option_id]=>g)
  #=> {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}}
  #     .update(7=>{:option_id=>7, :option_style_ids=>[19]}) 
  #=>   {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #       7=>{:option_id=> 7, :option_style_ids=>[19]}}  

Again, h does not have a key 7, so update's block is not used.

The last element of enum is now passed to the block and the block calculation is performed:

g,h = enum.next
g #=> {:option_id=>10, :option_style_ids=>[9, 12]} 
h #=> {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #     7=>{:option_id=> 7, :option_style_ids=>[19]}} 

h.update(10=>g)

This time h contains the key (10) of the hash being merged into h ({ 10=>g }). update's block is therefore called up upon to determine the value for that key in the merged hash. The block is passed an array of three elements:

k,o,n = [10, h[10], g]   
  #=> [10, {:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #    {:option_id=>10, :option_style_ids=>[9, 12]}] 
k #=> 10 
o #=> {:option_id=>10, :option_style_ids=>[9, 10, 11]} 
n #=> {:option_id=>10, :option_style_ids=>[9, 12]} 

We wish the block to return:

{:option_id=>10, :option_style_ids=>[9, 10, 11, 12]}

which we can do most easily like so:

{ :option_id=>o[:option_id],
  :option_style_ids=>o[:option_style_ids] | n[:option_style_ids] } 
  #=> { :option_id=>10,
  #     :option_style_ids=>[9, 10, 11] | [9, 12]
  #=> { :option_id=>10, :option_style_ids=>[9, 10, 11, 12]}

h[10] is set to this value, so now:

h #=> {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11, 12]},
  #     7=>{:option_id=>7, :option_style_ids=>[19]}}

which, because we are finished enumerating enum, is the value returned by each_with_object. The final step is to extract the values of this hash:

h.values
  #=> [{:option_id=>10, :option_style_ids=>[9, 10, 11, 12]},
  #    {:option_id=> 7, :option_style_ids=>[19]}] 
Sign up to request clarification or add additional context in comments.

2 Comments

Although I marked another answer as correct, I really appreciate you taking the time to explain this! Do you know, in terms of performance, how your method compares to using inject?
I don't know how this compares with inject performance-wise. If I had to guess I'd put my money on inject, but it would be easy to benchmark the two.
1
array.group_by{|h| h[:option_id]}.values.map do
  |a| a.inject{|h, _h| h.merge(_h){|k, v, _v| k == :option_id ? v : (v + _v).uniq}}
end
# => [
#   {:option_id=>10, :option_style_ids=>[9, 10, 11]},
#   {:option_id=>7, :option_style_ids=>[19]},
#   {:option_id=>8, :option_style_ids=>[1]},
#   {:option_id=>5, :option_style_ids=>[4, 5, 2]},
#   {:option_id=>12, :option_style_ids=>[20]}
# ]

Comments

1

It's not exactly the output you want, but you could convert it from here.

require 'set'
src = [...your original array of hashes...]
style_ids_by_option_id = {}
src.each do |e|
  style_ids_by_option_id[e[:option_id]] ||= Set.new
  style_ids_by_option_id[e[:option_id]].merge(e[:option_style_ids])
end

This would result in a data structure like this:

{10=>#<Set: {9, 10, 11}>,
 7=>#<Set: {19}>,
 8=>#<Set: {1}>,
 5=>#<Set: {4, 5, 2}>,
 12=>#<Set: {20}>}

1 Comment

If you used an array, rather than a set, it would be src.each_with_object({}) do |h,sids|; id = h[:option_id]; sids[id] ||= []; sids[id] |= h[:option_style_ids]; end #=> {10=>[9, 10, 11], 7=>[19], 8=>[1], 5=>[4, 5, 2], 12=>[20]}.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.