1

I have an array of objects that looks like this:

[
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027"
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01"
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN2",
    "field number" => "034:02"
  },
  .....
]

I need to search through the array and mark duplicates based on the "field name" property. For this, I could use something like uniq { |i| i["field name"] }

However, for any duplicate items that are found, the item that ends up not being deleted needs to have a property added to it: multiple => true. I do not care which object ends up being the one that stays in the array, so long as it is marked with this property. So, running the function on the example above might produce:

[
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01",
    "multiple" => true
  },

  .....
]

Besides the removal of duplicates, I also need to be sure that the array's order is not affected by the function.

What is the best way to go about this?

4
  • @Yu Hao I see you deleted the references to the docs in my question. Why should I not be doing this? Commented Jul 20, 2015 at 13:43
  • 1
    It's just links to the reference manual of Array and Hash, probably two of the most commonly used classes. You are not referring to some any particular methods, either. Any Rubyist knows where to find them, so I think it adds little information to your question. Commented Jul 20, 2015 at 13:50
  • Your array is invalid. Commented Jul 20, 2015 at 15:00
  • 1
    @sawa added commas, sorry Commented Jul 20, 2015 at 15:01

5 Answers 5

1

Using this array:

a = [
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN2",
    "field number" => "034:02",
  },
]

This code:

file_names = {}
a.select do
  |h| k = h["field name"]
  if file_names[k]
    file_names[k]["multiple"] = true
    false
  else
    file_names[k] = h
    true
  end
end

will give:

[
  {
    "field name"   => "Account number",
    "data type"    => "number",
    "mneumonic"    => "ACTNUM",
    "field number" => "027"
  },
  {
    "field name"   => "Warning",
    "data type"    => "code",
    "mneumonic"    => "WARN1",
    "field number" => "034:01",
    "multiple"     => true
  }
]
Sign up to request clarification or add additional context in comments.

Comments

1

Here's a pretty straightforward solution:

array # => your array of objects
used_names = []
multiple_names = []
array.each do |hash|
  name = hash['field name']
  if used_names.include? name
    multiple_names << name
    array.delete hash
  else
    used_names << name
  end
end
array.each do |hash|
  if multiple_names.include? hash['field name']
    hash['multiple'] = true
  end
end

4 Comments

This doesn't appear to work. I think its because you're using delete inside the loop
It didn't work because I've accidentally named the keys field_name instead of field name (with a space). Oh, and your test data has errors in the first object (unclosed " and no commas between values). I've updated and tested the answer.
Have you tested it on larger data sets than the one I provided? Won't using delete inside of each cause it to skip indices?
The iteration will continue, the only place it might not work is when you'll have more then one exactly the same object in your array. If that may be the case simply change each to each_with_index and use delete_at(index) instead of delete, this way you will only ever delete the object at the specified index.
0

This version just counts the number of times "field name" occurs and if it's greater than 1 or not, it updates the hash as necessary.

field_name_counts = Hash.new 0

array.each do |hash|
  field_name = hash["field name"]
  field_name_counts[field_name] += 1
end

array.each do |hash|
  field_name = hash["field name"]
  if field_name_counts[field_name] > 1
    hash["multiple"] = true
  else
    hash["multiple"] = false
  end
end

1 Comment

This won't delete the duplicates
0

This solution builds a new array with duplicates excluded. For each item in the original array, it checks whether there is an existing item that was already seen with the same name. If so, it marks that existing item as existing["multiple"] = true and skips that iteration.

This has the desired effect of omitting duplicates in the new array and marking the originals.

unique_data = data.each_with_object([]) do |item, result|
  if (existing = result.find { |i| i["field name"] == item["field name"] })
    existing["multiple"] = true
    next
  end
  result << item
end

Comments

0

Provided you are using Ruby v1.9+ (where hashes are guaranteed to maintain key-insertion order) you can use the form of Hash#update (aka merge!)that employs a block to determine the values of keys that are present in both hashes being merged. a is the array given by @sawa.

a.each_with_object({}) do |f,g|
  g.update(f["field name"]=>f) { |_,h| h.merge("multiple"=>true) }
end.values
  #=> [{"field name"=>"Account number", "data type"=>"number",
  #     "mneumonic"=>"ACTNUM", "field number"=>"027"},
  #    {"field name"=>"Warning", "data type"=>"code", "mneumonic"=>"WARN1",
  #     "field number"=>"034:01", "multiple"=>true}] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.