2

I have two files: (1) the first file contains the users of the system, I read that file into an array (2) the second file contains statistics about those users

My task is to have a user count, something like

{"user1" => 1, "user2" => 0, "user3" => 4}

This how I solved the problem

# Result wanted
# Given the names and stats array generate the results array
# result = {'user1' => 3, 'user2' => 1, 'user3' => 0, 'user4' => 1}

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash[names.map {|v| [v, 0]}] # to make sure every name gets a value

stats.each do |item| # basic loop to count the records
   hash[item] += 1 if hash.has_key?(item)
end

puts hash

# terminal outcome
# $ ruby example.rb 
# {"user1"=>3, "user2"=>2, "user3"=>0, "user4"=>1}

I'm only curious if there is a better way than counting in a loop, specially since Ruby comes with magical powers and I come from a C background

2
  • That is exactly how I'd do it, though I might write it as: stats.each_with_object(Hash[names.map { |str| [str,0] }]) { |str, h| h[str] += 1 if h.key?(str) }. +1 for your answer. Why would you prefer a solution that must traverse stats once for each element of names to count the number of matches? Commented Apr 11, 2014 at 6:19
  • RE the pre-processing you mentioned in a comment on @Florin's answer, note ["howdy", "hi", "hello"].map {|s| s[0..2] if s.size > 3 } => ["how", nil, "hel"], whereas ["howdy", "hi", "hello"].map {|s| (s.size > 3) ? s[0..2] : s } => ["how", "hi", "hel"]. Commented Apr 11, 2014 at 17:36

4 Answers 4

2

Basically your code is the fastest you can get code to run for this, except for some minor issues.

If you have an unneeded entry marking the end of the array

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

I think you should pop it off prior to running, since it has the possibility of resulting in a weird entry, and its existence forces you to use the conditional test in your loop, slowing your code.

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.pop # => "xxx"
stats # => ["user1", "user1", "user1", "user2", "user4", "user2"]

Built-in methods exist, which reduce the amount of code to a single call, but they're slower than a loop:

stats.group_by{ |e| e } # => {"user1"=>["user1", "user1", "user1"], "user2"=>["user2", "user2"], "user4"=>["user4"], "xxx"=>["xxx"]}

From there it's easy to map the resulting hash into summaries:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] } # => [["user1", 3], ["user2", 2], ["user4", 1]]

And then into a hash again:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }.to_h # => {"user1"=>3, "user2"=>2, "user4"=>1}

or:

Hash[stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }] # => {"user1"=>3, "user2"=>2, "user4"=>1}

Using the built-in methods are efficient, and very useful when you're dealing with very large lists, because there's very little redundant looping going on.

Looping over the data like you did is also very fast, and usually faster than the built-in methods, when written correctly. Here are some benchmarks showing alternate ways of accomplishing this stuff:

require 'fruity'  # => true

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2']

Hash[names.map {|v| [v, 0]}]                    # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
Hash[names.zip([0] * names.size )]              # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
names.zip([0] * names.size ).to_h               # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
hash = {}; names.each{ |k| hash[k] = 0 }; hash  # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}

compare do 
  map_hash { Hash[names.map {|v| [v, 0]}] }
  zip_hash { Hash[names.zip([0] * names.size )] }
  to_h_hash { names.zip([0] * names.size ).to_h }
  hash_braces { hash = {}; names.each{ |k| hash[k] = 0 }; hash }
end

# >> Running each test 2048 times. Test will take about 1 second.
# >> hash_braces is faster than map_hash by 50.0% ± 10.0%
# >> map_hash is faster than to_h_hash by 19.999999999999996% ± 10.0%
# >> to_h_hash is faster than zip_hash by 10.000000000000009% ± 10.0%

Looking at the conditional in the loop to see how it effects the code:

require 'fruity'  # => true

NAMES = ['user1', 'user2', 'user3', 'user4']
STATS = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
STATS2 = STATS[0 .. -2]

def build_hash
  h = {}
  NAMES.each{ |k| h[k] = 0 }
  h
end

compare do 

  your_way {
    hash = build_hash()
    STATS.each do |item| # basic loop to count the records
      hash[item] += 1 if hash.has_key?(item)
    end
    hash
  }

  my_way {
    hash = build_hash()
    STATS2.each { |e| hash[e] += 1 }
    hash
  }
end

# >> Running each test 512 times. Test will take about 1 second.
# >> my_way is faster than your_way by 27.0% ± 1.0%

While several answers suggested using count, the code is going to slow down a lot as your lists increase in size, where walking the stats array once, as you are doing, will always be linear, so stick to one of these iterative solutions.

Sign up to request clarification or add additional context in comments.

1 Comment

@nemo, you can change the answer you selected. I'm not suggesting you do so here, but just wanted you to know that you can. You see it happen fairly often on SO, usually when a better answer is submitted late, but sometimes when a flaw is discovered in the selected answer or for some other reason. When changing the selected answer it is a courtesy to leave an explanatory comment for the person whose checkmark has been erased.
1
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash.new

names.each { |name| hash[name] = stats.count(name) }

puts hash

6 Comments

Thanks!!! I feel really stupid. One follow up: since my stats array is not well formatted (username followed by garbage) I had to do a pre step ahead of your code stats = stats.map { |stat| stat[0..8] if stat.length > 8 } I hope this is not stupid!
Why do you need that?
because in real life the stats array has 'user1xx', 'user1yyy' which still should count. Do you recommend a certain resource to learn ruby?
No idea, just search Google for each problem is what I do, I guess.
@nemo Feel free to ask another question with that specific point.
|
1
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

stats.each_with_object(Hash.new(0)) { |user,hash| 
  hash[user] += 1 if names.include?(user) }
#=> {"user1"=>3, "user2"=>2, "user4"=>1}

5 Comments

I like your answer, bjhaid, but slightly prefer nemo's (and @theTinMan's), because building the entire hash before traversing stats eliminates the need to search an array (names) for each element of stats.
@CarySwoveland, from the question stats contains "xxx" which creates the need to compare with names, the question does not also state that stats would definitely contain all members of names vice versa
Good point, even if "xxx" had not been shown, though nemo's answer, which has hash[item] += 1 if hash.has_key?(item) deals with that, and the hash lookup is much faster than searching an array.
@CarySwoveland if "xxx" was not shown I would not have included the if clause, and would have returned with a very fast result
Yes, the question is ambiguous where it says, "the second file contains statistics about those users". It's not clear whether it contains anything else. I would say, drop your if if stats - names => []; else, do as nemo has done, by first constructing a counting hash.
1

You could use with map and Hash[].

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map { |name| [name, stats.count(name)] }]
puts hash

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.