Mark and remove duplicates in an array of objects using a specific property for comparison

Question

I have an array of objects that looks like this:

[
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027"
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01"
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN2",
    "field number" => "034:02"
  },
  .....
]

I need to search through the array and mark duplicates based on the "field name" property. For this, I could use something like uniq { |i| i["field name"] }

However, for any duplicate items that are found, the item that ends up not being deleted needs to have a property added to it: multiple => true. I do not care which object ends up being the one that stays in the array, so long as it is marked with this property. So, running the function on the example above might produce:

[
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01",
    "multiple" => true
  },

  .....
]

Besides the removal of duplicates, I also need to be sure that the array's order is not affected by the function.

What is the best way to go about this?

@Yu Hao I see you deleted the references to the docs in my question. Why should I not be doing this? — Luke
– Luke, Commented Jul 20, 2015 at 13:43
It's just links to the reference manual of Array and Hash, probably two of the most commonly used classes. You are not referring to some any particular methods, either. Any Rubyist knows where to find them, so I think it adds little information to your question. — Yu Hao
– Yu Hao, Commented Jul 20, 2015 at 13:50

sawa · Accepted Answer · 2015-07-20 15:01:55Z

Using this array:

a = [
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN2",
    "field number" => "034:02",
  },
]

This code:

file_names = {}
a.select do
  |h| k = h["field name"]
  if file_names[k]
    file_names[k]["multiple"] = true
    false
  else
    file_names[k] = h
    true
  end
end

will give:

[
  {
    "field name"   => "Account number",
    "data type"    => "number",
    "mneumonic"    => "ACTNUM",
    "field number" => "027"
  },
  {
    "field name"   => "Warning",
    "data type"    => "code",
    "mneumonic"    => "WARN1",
    "field number" => "034:01",
    "multiple"     => true
  }
]

Piotr Kruczek · Accepted Answer · 2015-07-20 14:32:09Z

1

Here's a pretty straightforward solution:

array # => your array of objects
used_names = []
multiple_names = []
array.each do |hash|
  name = hash['field name']
  if used_names.include? name
    multiple_names << name
    array.delete hash
  else
    used_names << name
  end
end
array.each do |hash|
  if multiple_names.include? hash['field name']
    hash['multiple'] = true
  end
end

edited Jul 20, 2015 at 14:32

answered Jul 20, 2015 at 13:46

Piotr Kruczek

2,38813 silver badges18 bronze badges

4 Comments

Luke Over a year ago

This doesn't appear to work. I think its because you're using delete inside the loop

Piotr Kruczek Over a year ago

It didn't work because I've accidentally named the keys field_name instead of field name (with a space). Oh, and your test data has errors in the first object (unclosed " and no commas between values). I've updated and tested the answer.

Luke Over a year ago

Have you tested it on larger data sets than the one I provided? Won't using delete inside of each cause it to skip indices?

Piotr Kruczek Over a year ago

The iteration will continue, the only place it might not work is when you'll have more then one exactly the same object in your array. If that may be the case simply change each to each_with_index and use delete_at(index) instead of delete, this way you will only ever delete the object at the specified index.

vikram7 · Accepted Answer · 2015-07-20 14:10:49Z

0

This version just counts the number of times "field name" occurs and if it's greater than 1 or not, it updates the hash as necessary.

field_name_counts = Hash.new 0

array.each do |hash|
  field_name = hash["field name"]
  field_name_counts[field_name] += 1
end

array.each do |hash|
  field_name = hash["field name"]
  if field_name_counts[field_name] > 1
    hash["multiple"] = true
  else
    hash["multiple"] = false
  end
end

answered Jul 20, 2015 at 14:10

vikram7

4953 silver badges12 bronze badges

1 Comment

Luke Over a year ago

This won't delete the duplicates

Matt Brictson · Accepted Answer · 2015-07-20 15:14:19Z

0

This solution builds a new array with duplicates excluded. For each item in the original array, it checks whether there is an existing item that was already seen with the same name. If so, it marks that existing item as existing["multiple"] = true and skips that iteration.

This has the desired effect of omitting duplicates in the new array and marking the originals.

unique_data = data.each_with_object([]) do |item, result|
  if (existing = result.find { |i| i["field name"] == item["field name"] })
    existing["multiple"] = true
    next
  end
  result << item
end

answered Jul 20, 2015 at 15:14

Matt Brictson

11.1k1 gold badge40 silver badges45 bronze badges

Comments

Cary Swoveland · Accepted Answer · 2015-07-20 19:32:10Z

0

Provided you are using Ruby v1.9+ (where hashes are guaranteed to maintain key-insertion order) you can use the form of Hash#update (aka merge!)that employs a block to determine the values of keys that are present in both hashes being merged. a is the array given by @sawa.

a.each_with_object({}) do |f,g|
  g.update(f["field name"]=>f) { |_,h| h.merge("multiple"=>true) }
end.values
  #=> [{"field name"=>"Account number", "data type"=>"number",
  #     "mneumonic"=>"ACTNUM", "field number"=>"027"},
  #    {"field name"=>"Warning", "data type"=>"code", "mneumonic"=>"WARN1",
  #     "field number"=>"034:01", "multiple"=>true}]

answered Jul 20, 2015 at 19:32

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Collectives™ on Stack Overflow

Mark and remove duplicates in an array of objects using a specific property for comparison

5 Answers 5

Comments

4 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

4 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related