Pulling values from one array into a subarray of another array of another based on criteria in both using Ruby

Question

I have an array of arrays as such:

exarray = [
  ["John Doe", "12/31/2015", "1504"],
  ["Jane Doe", "12/31/2015", "0904"],
  ["John Doe", "04/08/2015", "1300"],
  ["Jimmy Dean", "01/01/2014", "0406"],
  ["John Doe", "04/08/2015", "1402"],
  ["Jane Doe", "12/31/2015", "0908"],
  ["Jane Doe", "12/31/2015", "1045"]
]

My final goal is to have it as an array of arrays with [user, date, firsttime, maxtime]. Ex:

array = [
  ["Jane Doe", "12/31/2015", "0904", "1045"],
  # Other users, dates, times
]

I have been working for a while on a solution, and at this point I have the array above and another I have created as an array of dates and arr users that have entries on them. Ex:

new_array = ["12/31/2015", [["Jane Doe"], ["John Doe"]], ["OtherDates", [["UserA"],["UserB"]]]

And my thought was that I would collect the time from the original array as an array and add it with each user, then go from there to get the min/max.

new2_array =  ["12/31/2015", [["Jane Doe", ["0908","0904","1045"], ["John Doe",["1504"]], ["OtherDates", [["UserA",[Times]],["UserB",[Times]]]]

However, I am having great difficulty figuring out how to match the time with user and date with my new array. I have tried several attempts with map, each and collect with no success, and I believe I have the logic wrong in my head.

new2_array = new_array.each do |n, d| 
  exarray.each do |sn, sd, st| 
    if sd ==bd && n.include?(sn) 
      st
    end
  end 
end

I have tried variations of map, collect with no success. Is there a better way to accomplish what I'm trying to do? I am very new to programming and just teaching myself ruby, but I have read over and over the array docs with new ideas or inspirations on how to accomplish what I would like to do.

Can you edit your question to explain more clearly the output you want, and how it relates to the input? "Other users, dates, times" is super vague. Should the output have one entry per user, or one entry per user per date, or...? It seems like Enumerable#group_by is probably the first step you want, but without more information it's pretty hard to say. — Jordan Running
– Jordan Running, Commented Dec 31, 2015 at 21:21
@CarySwoveland I apologize greatly for the typo. I have updated this. These were imported from a csv file and parsed from a single long string per line, that is why they are strings and not int stored as numbers. I should have typed this up in my editor and then copy pasted, I will do that moving forward. — StoutPanda
– StoutPanda, Commented Dec 31, 2015 at 22:45

Cary Swoveland · Accepted Answer · 2016-01-04 23:07:30Z

arr = [
  ["John Doe",   "12/31/2015", "1504"],
  ["Jane Doe",   "12/31/2015", "0904"],
  ["John Doe",   "04/08/2015", "1300"],
  ["Jimmy Dean", "01/01/2014", "0406"],
  ["John Doe",   "04/08/2015", "1402"],
  ["Jane Doe",   "12/31/2015", "0908"],
  ["Jane Doe",   "12/31/2015", "1045"]
]

arr.each_with_object({}) { |(name,date,val),h|
  h.update(name => { date: date, val: [val.to_i] }) { |_,h1,h2|
    { date: h1[:date], val: h1[:val] + h2[:val] } } }.
      map { |name, h| [name, h[:date], *h[:val].minmax.map { |n| "%04d" % n }] }
  #=> [["John Doe",   "12/31/2015", "1300", "1504"],
  #    ["Jane Doe",   "12/31/2015", "0904", "1045"],
  #    ["Jimmy Dean", "01/01/2014", "0406", "0406"]]

I will explain how this works and will also try to describe the thinking process that led to this answer. I realize that you are new to Ruby, so it may not all make sense the first time through, or even the fourth time through.

We need to do some aggregation, or grouping, of the elements (arrays) of arr; namely, we want to group elements by name, the first element of each element (array) of arr. When you want to aggregate, think "hash", with a single key being the (unique) object by which the aggregation is done, here the name. There are two ways of doing that: build a hash from scratch (starting with an empty hash, {}) or use a method that returns a suitable hash. One such method that is applicable here is Enumerable#group_by^1,2:

arr.group_by { |a| a.first }
  #=> {"John Doe"  =>[["John Doe",   "12/31/2015", "1504"],
  #                   ["John Doe",   "04/08/2015", "1300"],
  #                   ["John Doe",   "04/08/2015", "1402"]],
  #    "Jane Doe"  =>[["Jane Doe",   "12/31/2015", "0904"],
  #                   ["Jane Doe",   "12/31/2015", "0908"],
  #                   ["Jane Doe",   "12/31/2015", "1045"]],
  #    "Jimmy Dean"=>[["Jimmy Dean", "01/01/2014", "0406"]]}

I could have used group_by³, but chose the first route, building the hash from scratch. Let's start with:

h = {}

To build the hash h we can use the method Hash#update (aka merge!). For example, if h = { :a=>1 }, then

h.update({ :b=>2 }) #=> { :a=>1, :b=>2 }

Ruby allows us to write this without the braces:

h.update(:b=>2) #=> { :a=>1, :b=>2 }

and to use a short form when the keys are symbols:

h.update(b: 2) #=> { a: 1, b: 2 }

so I'll do that from here on. We also have:

{ a: 1 }.update(a: 2) #=> { a: 2 }

What we want is something like:

{ a: [1] }.update(a: [2]) #=> { a: [1,2] }

We can obtain that using the form of update (see the doc) that employs a hash to determine the values of keys that are present in both hashes being merged:

arr.each { |a|
  h.update(a[0]=>{ date: a[1], val: [a[2].to_i] }) { |k,h1,h2|
    { date: h1[:date], val: h1[:val] + h2[:val] } } }

Before examining this more closely, let's disambiguate the block variable a into its three elements, name, date and val. We have:

arr.each { |name,date,val|
  h.update(name=>{ date: date,  val: [val.to_i] }) { |k,h1,h2|
    { date: h1[:date], val: h1[:val] + h2[:val] } } }

each returns its receiver, arr, not the updated value of h, which is:

  h #=> {"John Doe"  =>{:date=>"12/31/2015", :val=>[1504, 1300, 1402]},
    #    "Jane Doe"  =>{:date=>"12/31/2015", :val=>[904, 908, 1045]},
    #    "Jimmy Dean"=>{:date=>"01/01/2014", :val=>[406]}}

We can step through this calculation as follows:

enum = arr.each
  #=> #<Enumerator: [["John Doe",   "12/31/2015", "1504"],
  #                  ["Jane Doe",   "12/31/2015", "0904"],
  #                  ["John Doe",   "04/08/2015", "1300"],
  #                  ["Jimmy Dean", "01/01/2014", "0406"],
  #                  ["John Doe",   "04/08/2015", "1402"],
  #                  ["Jane Doe",   "12/31/2015", "0908"],
  #                  ["Jane Doe",   "12/31/2015", "1045"]]:each>

The first value of the enumerator enum (["John Doe", "12/31/2015", "1504"]) is passed to the block and the block values are assigned, using parallel assignment (or multiple assignment). We can simulate that using Enumerator#next:

name, date, val = enum.next
  #=> ["John Doe", "12/31/2015", "1504"] 
name
  #=> "John Doe"
date
  #=> "12/31/2015" 
val
  #=> "1504"

and the block calculation is performed:

h.update(name=>{ date: date,  val: [val.to_i] })
  #=> {}.update("John Doe"=>{ :date=>"12/31/2015", :val=>["1504"] })
  #=> {"John Doe"=>{:date=>"12/31/2015", :val=>[1504]}}

The return value is the updated value of h.

Since we are merging { "John Doe"=>{ :date=>"12/31/2015", :val=>"1504" } } into {} the two hashes have no shared keys. Therefore, the block for determining values (which I've not included above) is not used.

Now the second element of enum (["Jane Doe", "12/31/2015", "0904"]) is passed to the block and the block calculation is performed:

name, date, val = enum.next
  #=> ["Jane Doe", "12/31/2015", "0904"] 
name
  #=> "Jane Doe"
date
  #=> "12/31/2015" 
val
  #=> "0904" 

h.update(name=>{ date: date,  val: [val.to_i] })
  #=> {"John Doe"=>{:date=>"12/31/2015", :val=>[1504]}}.
  #     update("Jane Doe"=>{ :date=>"12/31/2015", :val=>["0904"] })
  #=> {"John Doe"=>{:date=>"12/31/2015", :val=>[1504]},
  #    "Jane Doe"=>{:date=>"12/31/2015", :val=>[904]}}

Again, the block for determining values is not used because the two hashes ({"John Doe"=>{:date=>"12/31/2015", :val=>["1504"]}} and { "Jane Doe"=>{ :date=>"12/31/2015", :val=>["0904"] } }) have no common keys.

The third value is passed to the block:

name, date, val = enum.next
  #=> ["John Doe", "04/08/2015", "1300"] 
h.update(name=>{ date: date,  val: [val.to_i] }) { |k,h1,h2|
  { date: h1[:date], val: h1[:val] + h2[:val] } }
  #=> h.update("John Doe"=>{ date: "04/08/2015",  val: [1300] }) { |k,h1,h2|
  { date: h1[:date], val: h1[:val] + h2[:val] } }
  #=> {"John Doe"=>{:date=>"12/31/2015", :val=>[1504, 1300]},
  #    "Jane Doe"=>{:date=>"12/31/2015", :val=>[904]}}

This time both hashes being merged have the key "John Doe", so the block is used to determine the value of "John Doe". We have⁴:

k  #=> "John Doe"
h1 #=> { date: "12/31/2015", val: [1504] } # "old" value
h2 #=> { date: "04/08/2015", val: [1300] } # "new" value

{ date: h1[:date], val: h1[:val] + h2[:val] }
  #=> { date: "12/31/2015", val: [1504] + [1300] }
  #=> { date: "12/31/2015", val: [1504, 1300] }

The calculations are similar for the remaining elements of enum. As shown above, the result is the hash:

  h #=> {"John Doe"  =>{:date=>"12/31/2015", :val=>[1504, 1300, 1402]},
    #    "Jane Doe"  =>{:date=>"12/31/2015", :val=>[904, 908, 1045]},
    #    "Jimmy Dean"=>{:date=>"01/01/2014", :val=>[406]}}

It remains to convert the hash to the desired array. This is actually the easy part. It involves the calculation of the minimum and maximum values of each key :val in the inner hash, and altering the format. If integer values were desired for the min and max⁵, we could do this:

h.map { |k,v| [k, v[:date], v[:val].minmax] }
  #=> [["John Doe", "12/31/2015", [1300, 1504]],
  #    ["Jane Doe", "12/31/2015", [904, 1045]],
  #    ["Jimmy Dean", "01/01/2014", [406, 406]]]

Since four-character strings (with leading zeroes) are desired for the min and max values, another step is required:

    h.map { |k,v| [k, v[:date], v[:val].minmax.map { |n| "%04d" % n }] }

As this final step is not central to the question, I will omit the explanation of the conversion.

Lastly:

rather than defining h = {}, first, I used the method Enumerable#each_with_object, the "object" being a hash represented by the block variable h, which is the value returned by the method. The initial value of the hash is given by the argument {}.
since the block variable k in the block for determining the values of keys that are in both hashes being merged is not used in the block calculation, I've changed it to the local variable _, which is customary.
I chained the construction of the hash to its mapping to the desired array.

^{1 When, as here, the receiver arr is an array, you'll want to look for methods you might use in the class Array or in the module Enumerable. Enumerable is included ("mixed-in) by several classes, Array being one. Similarly, if the receiver were a hash, you'd look in the class Hash and in Enumerable.}

^{2 One day, long ago, a very wise man from the land of the Rising Sun noticed that many methods he used for arrays were very similar to those he used for hashes, ranges and other collections. He saw that he could write them so that the only difference was how the method each was implemented, so he put them all in a module he called "可算の" ("Enumerable"), and then in all the classes for different types of collections (Array, Hash, Range, Set, etc.) he added include Enumerable and a method each. After doing this, he thought, "生活は快適です" ("life is good").}

^{3 Once you understand the approach I have taken, see if you can answer the question using group_by.}

^{4 The problem is made easier by the fact that the value of :date is the same for all elements of enum having the same name, so below I could use either h1[:date] or h2[:date].}

^{5 Computing the minimum and maximum values of an array is a fairly common task, so you should expect Ruby to provide a method to do that. Peruse the docs for Array for such a method. Nothing there, so try Enumerable. Bingo: Enumerable#minmax.}

Thank you very much Cary. I look forward to your explanation. Happy New Years!
Thank you so much for the detailed explanation. You've opened the world of hashes to me, and the power of their use for grouping. It took me several times but I finally understand it. I actually used this to go back and refactor my whole script, and brought them in as a hash much earlier and played around a bit using datetime rather than bringing the value in as a string (though I did have to break it up into a few lines, as following it all through one was hard for me to keep up with.) Thank you again Cary, I sincerely appreciate the help and detailed explanation.

Martin Konecny · Accepted Answer · 2015-12-31 21:34:42Z

1

And my thought was that I would collect the time from the original array as an array and add it with each user, then go from there to get the min/max.

I think this is a very convoluted data structure for what you are trying to accomplish. First, try to avoid arrays of objects that are different types (a collcetion of strings, arrays etc.)

Why not something like

data = {
    "12/31/2015" => [
       {username: "Jane Doe", times: ["0908","0904","1045"], OtherDates: ['']}, 
       {username: "John", times: ["0908","0904","1045"], OtherDates: ['']}
    ]
}

This way you can pick a date that you want, and then iterate over all the employee objects, and get your target data.

For example this way you can iterate to get the "max" time for each user:

data["12/31/2015"].each do |i|
    puts "Username #{i['username']} max-time: #{i['times'].max}"
end

or accumulate for all users to get max:

data["12/31/2015"].map do |i|
    i[:times]
end.flatten.max

answered Dec 31, 2015 at 21:34

Martin Konecny

59.9k20 gold badges144 silver badges159 bronze badges

1 Comment

StoutPanda Over a year ago

Thank you very much for the feedback. I will keep your tip regarding arrays of different types in mind for all my programming going forward.

Collectives™ on Stack Overflow

Pulling values from one array into a subarray of another array of another based on criteria in both using Ruby

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related