How can I filter an array of hashes to get only the keys in another array?

Question

I'm trying get a subset of keys for each hash in an array.

The hashes are actually much larger, but I figured this is easier to understand:

[
  {
    id:2,
    start: "3:30",
    break: 30,
    num_attendees: 14
  },
  {
    id: 3,
    start: "3: 40",
    break: 40,
    num_attendees: 4
  },
  {
    id: 4,
    start: "4: 40",
    break: 10,
    num_attendees: 40
  }
]

I want to get only the id and start values.

I've tried:

return_keys = ['id','start']
return_array = events.select{|key,val|  key.to_s.in? return_keys}

but this returns an empty array.

Andrew Marshall · Accepted Answer · 2012-03-02 18:57:41Z

53

This should do what you want:

events.map do |hash|
  hash.select do |key, value|
    [:id, :start].include? key
  end
end

Potentially faster (but somewhat less pretty) solution:

events.map do |hash|
  { id: hash[:id], start: hash[:start] }
end

If you need return_keys to be dynamic:

return_keys = [:id, :start]
events.map do |hash|
  {}.tap do |new_hash|
    return_keys.each do |key|
      new_hash[key] = hash[key]
    end
  end
end

Note that, in your code, select picks out elements from the array, since that's what you called it on, but doesn't change the hashes contained within the array.

If you're concerned about performance, I've benchmarked all of the solutions listed here (code):

                user     system      total        real
amarshall 1  0.140000   0.000000   0.140000 (  0.140316)
amarshall 2  0.060000   0.000000   0.060000 (  0.066409)
amarshall 3  0.100000   0.000000   0.100000 (  0.101469)
tadman 1     0.140000   0.010000   0.150000 (  0.145489)
tadman 2     0.110000   0.000000   0.110000 (  0.111838)
mu           0.130000   0.000000   0.130000 (  0.128688)

edited Mar 2, 2012 at 18:57

answered Mar 2, 2012 at 18:01

Andrew Marshall

97.3k20 gold badges228 silver badges217 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

tadman Over a year ago

For N keys in events and M keys in each hash, and P keys in the inner array, this performs at O(MNP) speed, which could be cripplingly slow.

Andrew Marshall Over a year ago

@tadman Though, I suppose it's really O(NP)? I don't think there's anything faster than that. Assuming P is very small though, it shouldn't really affect the time complexity.

Andrew Marshall Over a year ago

I've also updated to include code for when return_keys needs to be dynamic.

pedalpete Over a year ago

Awesome andrew, #2 is significantly faster, and I don't think that code is particularly un-pretty. I don't have the need for return keys to be dynamic at the moment, and the hashes can get pretty big, so I'll go for door #2.

tadman Over a year ago

Nice work. #2 is the optimal solution if the keys selected are small and predictable. These probably have wildly different properties if the numbers involved grow large, e.g. N=10e6, M=100, P=50, but that is only an academic consideration if the values are known to be small.

mu is too short · Accepted Answer · 2012-03-02 18:48:28Z

38

If you happen to be using Rails (or don't mind pulling in all or part of ActiveSupport) then you could use Hash#slice:

return_array = events.map { |h| h.slice(:id, :start) }

Hash#slice does some extra work under the covers but it is probably fast enough that you won't notice it for small hashes and the clarity is quite nice.

answered Mar 2, 2012 at 18:48

mu is too short

436k71 gold badges863 silver badges822 bronze badges

1 Comment

Andrew Marshall Over a year ago

Actually, you need to require 'active_support/core_ext' if you're not in Rails. Core extensions need to be loaded explicitly so just require 'active_support' doesn't work. (I say this because the latter is what most would consider "pulling in all of ActiveSupport".)

tadman · Accepted Answer · 2012-03-02 18:21:38Z

2

A better solution is to use a hash as your index instead of doing a linear array lookup for each key:

events = [{id:2, start:"3:30",break:30,num_attendees:14},{id:3, start:"3:40",break:40,num_attendees:4},{id:4, start:"4:40",break:10,num_attendees:40}]

return_keys = [ :id, :start ]

# Compute a quick hash to extract the right values: { key => true }
key_index = Hash[return_keys.collect { |key| [ key, true ] }]

return_array = events.collect do |event|
  event.select do |key, value|
    key_index[key]
  end
end

# => [{:id=>2, :start=>"3:30"}, {:id=>3, :start=>"3:40"}, {:id=>4, :start=>"4:40"}]

I've adjusted this to use symbols as the key names to match your definition of events.

This can be further improved by using the return_keys as a direct driver:

events = [{id:2, start:"3:30",break:30,num_attendees:14},{id:3, start:"3:40",break:40,num_attendees:4},{id:4, start:"4:40",break:10,num_attendees:40}]

return_keys = [ :id, :start ]

return_array = events.collect do |event|
  Hash[
    return_keys.collect do |key|
      [ key, event[key] ]
    end
  ]
end

The result is the same. If the subset you're extracting tends to be much smaller than the original, this might be the best approach.

answered Mar 2, 2012 at 18:21

tadman

212k23 gold badges237 silver badges266 bronze badges

1 Comment

Andrew Marshall Over a year ago

In case you're curious, I benchmarked all the solutions here and posted the results in my answer :).

Cary Swoveland · Accepted Answer · 2016-10-28 00:31:41Z

Considering that efficiency appears to be a concern, I would suggest the following.

Code

require 'set'

def keep_keys(arr, keeper_keys)
  keepers = keeper_keys.to_set
  arr.map { |h| h.select { |k,_| keepers.include?(k) } }
end

This uses Hash#select, which, unlike Enumerable#select, returns a hash. I've converted keeper_keys to a set for fast lookups.

Examples

arr = [{ id:2, start: "3:30", break: 30 },
       { id: 3, break: 40, num_attendees: 4 },
       { break: 10, num_attendees: 40 }]

keep_keys arr, [:id, :start]
  #=> [{:id=>2, :start=>"3:30"}, {:id=>3}, {}] 
keep_keys arr, [:start, :break]
  #=> [{:start=>"3:30", :break=>30}, {:break=>40}, {:break=>10}] 
keep_keys arr, [:id, :start, :cat]
  #=> [{:id=>2, :start=>"3:30"}, {:id=>3}, {}] 
keep_keys arr, [:start]
  #=> [{:start=>"3:30"}, {}, {}] 
keep_keys arr, [:cat, :dog]

Collectives™ on Stack Overflow

How can I filter an array of hashes to get only the keys in another array?

4 Answers 4

5 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related