Ruby: counting records in an array

Question

I have two files: (1) the first file contains the users of the system, I read that file into an array (2) the second file contains statistics about those users

My task is to have a user count, something like

{"user1" => 1, "user2" => 0, "user3" => 4}

This how I solved the problem

# Result wanted
# Given the names and stats array generate the results array
# result = {'user1' => 3, 'user2' => 1, 'user3' => 0, 'user4' => 1}

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash[names.map {|v| [v, 0]}] # to make sure every name gets a value

stats.each do |item| # basic loop to count the records
   hash[item] += 1 if hash.has_key?(item)
end

puts hash

# terminal outcome
# $ ruby example.rb 
# {"user1"=>3, "user2"=>2, "user3"=>0, "user4"=>1}

I'm only curious if there is a better way than counting in a loop, specially since Ruby comes with magical powers and I come from a C background

That is exactly how I'd do it, though I might write it as: stats.each_with_object(Hash[names.map { |str| [str,0] }]) { |str, h| h[str] += 1 if h.key?(str) }. +1 for your answer. Why would you prefer a solution that must traverse stats once for each element of names to count the number of matches? — Cary Swoveland
– Cary Swoveland, Commented Apr 11, 2014 at 6:19
RE the pre-processing you mentioned in a comment on @Florin's answer, note ["howdy", "hi", "hello"].map {|s| s[0..2] if s.size > 3 } => ["how", nil, "hel"], whereas ["howdy", "hi", "hello"].map {|s| (s.size > 3) ? s[0..2] : s } => ["how", "hi", "hel"]. — Cary Swoveland
– Cary Swoveland, Commented Apr 11, 2014 at 17:36

the Tin Man · Accepted Answer · 2014-04-11 02:18:16Z

Basically your code is the fastest you can get code to run for this, except for some minor issues.

If you have an unneeded entry marking the end of the array

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

I think you should pop it off prior to running, since it has the possibility of resulting in a weird entry, and its existence forces you to use the conditional test in your loop, slowing your code.

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.pop # => "xxx"
stats # => ["user1", "user1", "user1", "user2", "user4", "user2"]

Built-in methods exist, which reduce the amount of code to a single call, but they're slower than a loop:

stats.group_by{ |e| e } # => {"user1"=>["user1", "user1", "user1"], "user2"=>["user2", "user2"], "user4"=>["user4"], "xxx"=>["xxx"]}

From there it's easy to map the resulting hash into summaries:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] } # => [["user1", 3], ["user2", 2], ["user4", 1]]

And then into a hash again:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }.to_h # => {"user1"=>3, "user2"=>2, "user4"=>1}

or:

Hash[stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }] # => {"user1"=>3, "user2"=>2, "user4"=>1}

Using the built-in methods are efficient, and very useful when you're dealing with very large lists, because there's very little redundant looping going on.

Looping over the data like you did is also very fast, and usually faster than the built-in methods, when written correctly. Here are some benchmarks showing alternate ways of accomplishing this stuff:

require 'fruity'  # => true

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2']

Hash[names.map {|v| [v, 0]}]                    # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
Hash[names.zip([0] * names.size )]              # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
names.zip([0] * names.size ).to_h               # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
hash = {}; names.each{ |k| hash[k] = 0 }; hash  # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}

compare do 
  map_hash { Hash[names.map {|v| [v, 0]}] }
  zip_hash { Hash[names.zip([0] * names.size )] }
  to_h_hash { names.zip([0] * names.size ).to_h }
  hash_braces { hash = {}; names.each{ |k| hash[k] = 0 }; hash }
end

# >> Running each test 2048 times. Test will take about 1 second.
# >> hash_braces is faster than map_hash by 50.0% ± 10.0%
# >> map_hash is faster than to_h_hash by 19.999999999999996% ± 10.0%
# >> to_h_hash is faster than zip_hash by 10.000000000000009% ± 10.0%

Looking at the conditional in the loop to see how it effects the code:

require 'fruity'  # => true

NAMES = ['user1', 'user2', 'user3', 'user4']
STATS = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
STATS2 = STATS[0 .. -2]

def build_hash
  h = {}
  NAMES.each{ |k| h[k] = 0 }
  h
end

compare do 

  your_way {
    hash = build_hash()
    STATS.each do |item| # basic loop to count the records
      hash[item] += 1 if hash.has_key?(item)
    end
    hash
  }

  my_way {
    hash = build_hash()
    STATS2.each { |e| hash[e] += 1 }
    hash
  }
end

# >> Running each test 512 times. Test will take about 1 second.
# >> my_way is faster than your_way by 27.0% ± 1.0%

While several answers suggested using count, the code is going to slow down a lot as your lists increase in size, where walking the stats array once, as you are doing, will always be linear, so stick to one of these iterative solutions.

@nemo, you can change the answer you selected. I'm not suggesting you do so here, but just wanted you to know that you can. You see it happen fairly often on SO, usually when a better answer is submitted late, but sometimes when a flaw is discovered in the selected answer or for some other reason. When changing the selected answer it is a courtesy to leave an explanatory comment for the person whose checkmark has been erased.

Florin Asăvoaie · Accepted Answer · 2014-04-11 00:23:44Z

1

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash.new

names.each { |name| hash[name] = stats.count(name) }

puts hash

edited Apr 11, 2014 at 0:23

answered Apr 11, 2014 at 0:17

Florin Asăvoaie

9181 gold badge6 silver badges20 bronze badges

6 Comments

user1781626 Over a year ago

Thanks!!! I feel really stupid. One follow up: since my stats array is not well formatted (username followed by garbage) I had to do a pre step ahead of your code stats = stats.map { |stat| stat[0..8] if stat.length > 8 } I hope this is not stupid!

Florin Asăvoaie Over a year ago

Why do you need that?

user1781626 Over a year ago

because in real life the stats array has 'user1xx', 'user1yyy' which still should count. Do you recommend a certain resource to learn ruby?

Florin Asăvoaie Over a year ago

No idea, just search Google for each problem is what I do, I guess.

Mark Thomas Over a year ago

@nemo Feel free to ask another question with that specific point.

|

bjhaid · Accepted Answer · 2014-04-11 00:13:10Z

1

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

stats.each_with_object(Hash.new(0)) { |user,hash| 
  hash[user] += 1 if names.include?(user) }
#=> {"user1"=>3, "user2"=>2, "user4"=>1}

answered Apr 11, 2014 at 0:13

bjhaid

9,8502 gold badges41 silver badges47 bronze badges

5 Comments

Cary Swoveland Over a year ago

I like your answer, bjhaid, but slightly prefer nemo's (and @theTinMan's), because building the entire hash before traversing stats eliminates the need to search an array (names) for each element of stats.

bjhaid Over a year ago

@CarySwoveland, from the question stats contains "xxx" which creates the need to compare with names, the question does not also state that stats would definitely contain all members of names vice versa

Cary Swoveland Over a year ago

Good point, even if "xxx" had not been shown, though nemo's answer, which has hash[item] += 1 if hash.has_key?(item) deals with that, and the hash lookup is much faster than searching an array.

bjhaid Over a year ago

@CarySwoveland if "xxx" was not shown I would not have included the if clause, and would have returned with a very fast result

Cary Swoveland Over a year ago

Yes, the question is ambiguous where it says, "the second file contains statistics about those users". It's not clear whether it contains anything else. I would say, drop your if if stats - names => []; else, do as nemo has done, by first constructing a counting hash.

Guilherme · Accepted Answer · 2014-04-11 00:45:00Z

1

You could use with map and Hash[].

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map { |name| [name, stats.count(name)] }]
puts hash

answered Apr 11, 2014 at 0:45

Guilherme

1,14610 silver badges17 bronze badges

Collectives™ on Stack Overflow

Ruby: counting records in an array

4 Answers 4

1 Comment

6 Comments

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

6 Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related