0

I am working on a program that will compare two .csv files. After extracting the relevant data from one of the csv files into an array of arrays, I need to combine related entries. For example, I would want to turn this array:

[["11/13/15", ["4001", "1392"], "INBOUND"], 
["11/13/15", ["4090", "540"], "INBOUND"], 
["11/13/15", ["1139", "162"], "INBOUND"], 
["11/13/15", ["1158", "64"], "INBOUND"], 
["11/13/15", ["4055", "352"], "OUTBOUND"], 
["11/13/15", ["4055", "448"], "OUTBOUND"], 
["11/13/15", ["4055", "352"], "OUTBOUND"], 
["11/13/15", ["1139", "162"], "OUTBOUND"], 
["11/13/15", ["1158", "64"], "OUTBOUND"], 
["11/13/15", ["4091", "520"], "OUTBOUND"]]

into this:

[["11/13/15", ["4001", "1392"], "INBOUND"], 
["11/13/15", ["4090", "540"], "INBOUND"], 
["11/13/15", ["1139", "162"], "INBOUND"], 
["11/13/15", ["1158", "64"], "INBOUND"], 
["11/13/15", ["4055", "1152"], "OUTBOUND"], 
["11/13/15", ["1139", "162"], "OUTBOUND"], 
["11/13/15", ["1158", "64"], "OUTBOUND"], 
["11/13/15", ["4091", "520"], "OUTBOUND"]]

For some element of the array, if its items at [0], [1][0], and [2] match those of another one, then create a new item (array) with its item at [1][1] being the sum of all the items at [1][1] and delete the old arrays. If it would be easier, I can change the way the relevant data is extracted so that the item at [1] is not an array and each row has 4 items instead of 3.

4
  • are consecutive those elements to combine? Commented Nov 15, 2011 at 21:51
  • The data will be sorted so that it looks like the top array if printed, so yes (if I understand your question). Commented Nov 15, 2011 at 21:55
  • Assuming tokland's answer was realy what you wanted, your question has a typo for the value at [4][1][1] of the resulting array, which is the sole crucial value. It should be 1152, not 1115. I must say your question is sloppy. Commented Nov 15, 2011 at 22:24
  • @sawa Didn't notice that typo. Thanks for helping clarify my sentence as well. Commented Nov 17, 2011 at 20:54

4 Answers 4

2

I assume that the elements to group are consecutive so we can use Enumerable#chunk. Functional approach:

grouped_xs = xs.chunk { |date, (id1, id2), direction| [date, id1, direction] }
grouped_xs.map do |(date, id1, direction), ary|
  id2_sum = ary.map { |date, (id1, id2), direction| id2.to_i }.inject(:+)
  [date, id1, id2_sum.to_s, direction]
end

Output (you wanted 4 elements in the output array, right?):

[["11/13/15", "4001", "1392", "INBOUND"],
 ["11/13/15", "4090", "540", "INBOUND"],
 ["11/13/15", "1139", "162", "INBOUND"],
 ["11/13/15", "1158", "64", "INBOUND"],
 ["11/13/15", "4055", "1152", "OUTBOUND"],
 ["11/13/15", "1139", "162", "OUTBOUND"],
 ["11/13/15", "1158", "64", "OUTBOUND"],
 ["11/13/15", "4091", "520", "OUTBOUND"]]
Sign up to request clarification or add additional context in comments.

5 Comments

@Sean: you are welcome. Just as a general advice, I think it's better not to rush into selecting an answer too quickly, somebody may come up with a better solution :-)
That is some very compact Ruby @tokland :)
@SeanVikoren: Funcional solutions tend to be very compact. Sometimes they are a bit harder to read than imperative ones, but conceptually they are very nice (like putting together lego pieces instead of building everything from scratch).
@tokland I will keep that in mind for future questions (which I am sure I will have many).
@tokland: It does have a mesmerizing beauty.
2

And just for example - my one-liner (works with both 1.8 and 1.9 rubies):

table = [["11/13/15", ["4001", "1392"], "INBOUND"], 
["11/13/15", ["4090", "540"], "INBOUND"], 
["11/13/15", ["1139", "162"], "INBOUND"], 
["11/13/15", ["1158", "64"], "INBOUND"], 
["11/13/15", ["4055", "352"], "OUTBOUND"], 
["11/13/15", ["4055", "448"], "OUTBOUND"], 
["11/13/15", ["4055", "352"], "OUTBOUND"], 
["11/13/15", ["1139", "162"], "OUTBOUND"], 
["11/13/15", ["1158", "64"], "OUTBOUND"], 
["11/13/15", ["4091", "520"], "OUTBOUND"]]

result = table.group_by {|a, (b, c), d| [a, [b], d]}.map {|k, v| k[1] << v.map {|a| a[1][1].to_i}.inject(:+).to_s; k}

1 Comment

I especially like "group_by {|a, (b, c), d|" - very nice.
0

This should do it:

def lookup(list, id, direction)
  index = nil
  list.each_with_index do |e, i|
    if (id == e[1][0]) and (e[2] == direction)
      index = i
      break
    end
  end
  index
end

b = []

a.each do |e|
  id = e[1][0]
  direction = e[2]
  i = lookup(b, id, direction)
  if i.nil?
    b << e
  else
    count = e[1][1].to_i
    sum = count + b[i][1][1].to_i
    b[i][1][1] = sum.to_s
  end
end

b.each{|e| p e}

Output:

["11/13/15", ["4001", "1392"], "INBOUND"]
["11/13/15", ["4090", "540"], "INBOUND"]
["11/13/15", ["1139", "162"], "INBOUND"]
["11/13/15", ["1158", "64"], "INBOUND"]
["11/13/15", ["4055", "1152"], "OUTBOUND"]
["11/13/15", ["1139", "162"], "OUTBOUND"]
["11/13/15", ["1158", "64"], "OUTBOUND"]
["11/13/15", ["4091", "520"], "OUTBOUND"]

Comments

0
h = Hash.new(0)
[["11/13/15", ["4001", "1392"], "INBOUND"], 
["11/13/15", ["4090", "540"], "INBOUND"], 
["11/13/15", ["1139", "162"], "INBOUND"], 
["11/13/15", ["1158", "64"], "INBOUND"], 
["11/13/15", ["4055", "352"], "OUTBOUND"], 
["11/13/15", ["4055", "448"], "OUTBOUND"], 
["11/13/15", ["4055", "352"], "OUTBOUND"], 
["11/13/15", ["1139", "162"], "OUTBOUND"], 
["11/13/15", ["1158", "64"], "OUTBOUND"], 
["11/13/15", ["4091", "520"], "OUTBOUND"]]
.each{|a, (b, c), d| h[[a, b, d]] += c.to_i}
p h.map{|(a, b, d), c| [a, [b, c], d]}

will give:

[["11/13/15", ["4001", 1392], "INBOUND"],
["11/13/15", ["4090", 540], "INBOUND"],
["11/13/15", ["1139", 162], "INBOUND"],
["11/13/15", ["1158", 64], "INBOUND"],
["11/13/15", ["4055", 1152], "OUTBOUND"],
["11/13/15", ["1139", 162], "OUTBOUND"],
["11/13/15", ["1158", 64], "OUTBOUND"],
["11/13/15", ["4091", 520], "OUTBOUND"]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.