How to merge array of hashes with nested array

Question

I have a dataset similar to the following:

[
  {:option_id => 10, :option_style_ids => [9, 10, 11]},
  {:option_id => 7, :option_style_ids => [19]},
  {:option_id => 8, :option_style_ids => [1]},
  {:option_id => 5, :option_style_ids => [4, 5]},
  {:option_id => 10, :option_style_ids => [9, 10, 11]},
  {:option_id => 7, :option_style_ids => [19]},
  {:option_id => 5, :option_style_ids => [4, 5]},
  {:option_id => 8, :option_style_ids => [1]},
  {:option_id => 12, :option_style_ids => [20]},
  {:option_id => 5, :option_style_ids => [2, 5]}
]

I would like to merge the dataset for an output of:

[
  {:option_id => 10, :option_style_ids => [9, 10, 11]},
  {:option_id => 7, :option_style_ids => [19]},
  {:option_id => 8, :option_style_ids => [1]},
  {:option_id => 5, :option_style_ids => [2, 4, 5]},
  {:option_id => 12, :option_style_ids => [20]}
]

The above output strips the duplicates, however, for the option_id: 5 hashes, I need it to combines the option_style_ids array values (some of which are different).

I tried:

r.group_by{|h| h[:option_id]}.map{|k,v| v.reduce(:merge)}

Unfortunately, that did not combine the option_style_ids array values.

Don't forget to select an answer if you find any to be helpful. — Cary Swoveland
– Cary Swoveland, Commented Aug 13, 2015 at 18:47
Just in case you didn't know, you can upvote answers. I mention this because I noticed you didn't upvote the answer you selected. — Cary Swoveland
– Cary Swoveland, Commented Aug 19, 2015 at 1:42

Cary Swoveland · Accepted Answer · 2015-08-12 03:40:04Z

This can be done using Hash#update (aka Hash#merge!), using the form that employs a block to determine that values of keys that are present in both hashes being merged.

Code

def merge_em(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g[:option_id]=>g) do |_,o,n|
      { :option_id=>o[:option_id],
        :option_style_ids=>o[:option_style_ids] | n[:option_style_ids] }
    end
  end.values
end

Example

For the array given in the question, which I'll refer to as arr:

merge_em(arr) 
  #=> [{:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #    {:option_id=> 7, :option_style_ids=>[19]},
  #    {:option_id=> 8, :option_style_ids=>[1]},
  #    {:option_id=> 5, :option_style_ids=>[4, 5, 2]},
  #    {:option_id=>12, :option_style_ids=>[20]}]

Explanation

To explain what's going on, let me simplify arr:

arr = [
  { :option_id => 10, :option_style_ids => [9, 10, 11] },
  { :option_id =>  7, :option_style_ids => [19] },
  { :option_id => 10, :option_style_ids => [9, 12] }
]

The steps:

enum = arr.each_with_object({})
  #=> #<Enumerator: [
  #     {:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #     {:option_id=> 7, :option_style_ids=>[19]},
  #     {:option_id=>10, :option_style_ids=>[9, 12]}
  #   ]:each_with_object({})>

We can view the elements of enum by converting it to an array:

enum.to_a
  #=> [[{:option_id=>10, :option_style_ids=>[9, 10, 11]}, {}],
  #    [{:option_id=> 7, :option_style_ids=>[19]}, {}],
  #    [{:option_id=>10, :option_style_ids=>[9, 12]}, {}]]

As you see, enum contains three elements.

The first element of enum is passed to the block and assigned to the block variables:

g,h = enum.next
  #=> [{:option_id=>10, :option_style_ids=>[9, 10, 11]}, {}] 
g #=> {:option_id=>10, :option_style_ids=>[9, 10, 11]} 
h #=> {}

We now perform the block calculation:

h.update(g[:option_id]=>g)
  #=> {}.update(10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}
  #   {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}}

update returns the new value of h.

In merging { 10=>g } into h (Ruby permits the shorthand (10=>g) for this), h does not have a key 10, so update's block is not consulted in determining the merged value for h[10].

The next element of enum is passed to the block:

g,h = enum.next
  #=> [{:option_id=>7, :option_style_ids=>[19]},
  #=>  {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}}] 
g #=>  {:option_id=>7, :option_style_ids=>[19]} 
h #=>  {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}}

Notice that h has been updated.

We now perform the block calculation:

h.update(g[:option_id]=>g)
  #=> {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]}}
  #     .update(7=>{:option_id=>7, :option_style_ids=>[19]}) 
  #=>   {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #       7=>{:option_id=> 7, :option_style_ids=>[19]}}

Again, h does not have a key 7, so update's block is not used.

The last element of enum is now passed to the block and the block calculation is performed:

g,h = enum.next
g #=> {:option_id=>10, :option_style_ids=>[9, 12]} 
h #=> {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #     7=>{:option_id=> 7, :option_style_ids=>[19]}} 

h.update(10=>g)

This time h contains the key (10) of the hash being merged into h ({ 10=>g }). update's block is therefore called up upon to determine the value for that key in the merged hash. The block is passed an array of three elements:

k,o,n = [10, h[10], g]   
  #=> [10, {:option_id=>10, :option_style_ids=>[9, 10, 11]},
  #    {:option_id=>10, :option_style_ids=>[9, 12]}] 
k #=> 10 
o #=> {:option_id=>10, :option_style_ids=>[9, 10, 11]} 
n #=> {:option_id=>10, :option_style_ids=>[9, 12]}

We wish the block to return:

{:option_id=>10, :option_style_ids=>[9, 10, 11, 12]}

which we can do most easily like so:

{ :option_id=>o[:option_id],
  :option_style_ids=>o[:option_style_ids] | n[:option_style_ids] } 
  #=> { :option_id=>10,
  #     :option_style_ids=>[9, 10, 11] | [9, 12]
  #=> { :option_id=>10, :option_style_ids=>[9, 10, 11, 12]}

h[10] is set to this value, so now:

h #=> {10=>{:option_id=>10, :option_style_ids=>[9, 10, 11, 12]},
  #     7=>{:option_id=>7, :option_style_ids=>[19]}}

which, because we are finished enumerating enum, is the value returned by each_with_object. The final step is to extract the values of this hash:

h.values
  #=> [{:option_id=>10, :option_style_ids=>[9, 10, 11, 12]},
  #    {:option_id=> 7, :option_style_ids=>[19]}]

Although I marked another answer as correct, I really appreciate you taking the time to explain this! Do you know, in terms of performance, how your method compares to using inject?
I don't know how this compares with inject performance-wise. If I had to guess I'd put my money on inject, but it would be easy to benchmark the two.

sawa · Accepted Answer · 2015-08-11 21:30:12Z

1

array.group_by{|h| h[:option_id]}.values.map do
  |a| a.inject{|h, _h| h.merge(_h){|k, v, _v| k == :option_id ? v : (v + _v).uniq}}
end
# => [
#   {:option_id=>10, :option_style_ids=>[9, 10, 11]},
#   {:option_id=>7, :option_style_ids=>[19]},
#   {:option_id=>8, :option_style_ids=>[1]},
#   {:option_id=>5, :option_style_ids=>[4, 5, 2]},
#   {:option_id=>12, :option_style_ids=>[20]}
# ]

answered Aug 11, 2015 at 21:30

sawa

169k51 gold badges287 silver badges398 bronze badges

Comments

Philip Hallstrom · Accepted Answer · 2015-08-11 21:26:25Z

1

It's not exactly the output you want, but you could convert it from here.

require 'set'
src = [...your original array of hashes...]
style_ids_by_option_id = {}
src.each do |e|
  style_ids_by_option_id[e[:option_id]] ||= Set.new
  style_ids_by_option_id[e[:option_id]].merge(e[:option_style_ids])
end

This would result in a data structure like this:

{10=>#<Set: {9, 10, 11}>,
 7=>#<Set: {19}>,
 8=>#<Set: {1}>,
 5=>#<Set: {4, 5, 2}>,
 12=>#<Set: {20}>}

answered Aug 11, 2015 at 21:26

Philip Hallstrom

19.9k2 gold badges44 silver badges47 bronze badges

1 Comment

Cary Swoveland Over a year ago

If you used an array, rather than a set, it would be

src.each_with_object({}) do |h,sids|; id = h[:option_id]; sids[id] ||= []; sids[id] |= h[:option_style_ids]; end #=> {10=>[9, 10, 11], 7=>[19], 8=>[1], 5=>[4, 5, 2], 12=>[20]}

.

Collectives™ on Stack Overflow

How to merge array of hashes with nested array

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related