processing array with duplicates

Question

I have an array

a = ['A', 'B', 'B', 'C', 'D', 'D']

and I have to go thru all the elements, do something depending on whether the is the last occurance or not, and remove the element after processing it.

The elements are already sorted if that matters.

I'm looking for something efficient. Any suggestions?

Her what I have until now. THIS WORKS AS EXPECTED but not sure it is very efficient.

    a = ['A', 'B', 'B', 'C', 'D', 'D']

while !a.empty?
  b = a.shift

  unless a.count(b) > 0
    p "unique #{b}"
  else
    p "duplicate #{b}"
  end
end

and it produces

"unique A"
"duplicate B"
"unique B"
"unique C"
"duplicate D"
"unique D"

Thanks

Please share the code you've written so far and where you got stuck. Stack Overflow isn't a good place to ask other people to write code for you. — user94559
– user94559, Commented Jun 14, 2017 at 6:56
Sure, I will update the code. Thank you so much for your support. I do appreciate your kind help. — Sig
– Sig, Commented Jun 14, 2017 at 6:57
If array is sorted and you really need efficient solution, just iterate it with each_with_index and check if a[i] equals to a[i-1] or a[i+1]. Best speed/memory solution. — Pavel Mikhailyuk
– Pavel Mikhailyuk, Commented Jun 14, 2017 at 7:47

Gagan Gami · Accepted Answer · 2017-06-14 11:47:24Z

4

Simple way:

array = ["A", "B", "B", "C", "D", "D"]

array.group_by{|e| e}.each do |key,value| 
  *duplicate,  uniq = value
  duplicate.map do |e|
    puts "Duplicate #{e}"
  end
  puts "Unique #{uniq}"
end

As per Stefan's comment and suggestion, shorter way is:

array.chunk_while(&:==).each do |*duplicate, uniq|
  duplicate.map do |e|
    puts "Duplicate #{e}"
  end
  puts "Unique #{uniq}"
end


# Above both will give the same Output:
---------------------------------------
Unique A
Duplicate B
Unique B
Unique C
Duplicate D
Unique D

edited Jun 14, 2017 at 11:47

answered Jun 14, 2017 at 7:13

Gagan Gami

10.3k1 gold badge32 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

max pleaner Over a year ago

it's be faster to build uniq_elements like this: array.reduce(Hash.new(0)) { |memo, x| memo[x] += 1; memo }.select { |k,v| v > 0 }.keys

Sig Over a year ago

I must process all items so I have to loop thru all of them and for each one act differently according to the fact that the element is unique or now (see updated code above).

Stefan Over a year ago

@GaganGami "Unique B / Duplicate B" should be the other way round, i.e. "Duplicate B" and then "Unique B". "Unique" means that no similar element is coming afterwards.

Gagan Gami Over a year ago

@Stefan : ok got it

Stefan Over a year ago

Nice one, I like splats :-) Even shorter: .each do |key, (*duplicate, uniq)|. And you can even remove that unused key argument if you replace group_by { |e| e } with chunk_while(&:==)

|

Stefan · Accepted Answer · 2017-06-14 08:13:20Z

1

Based on your code and expected output, I think this is an efficient way to do what you're looking for:

a = ['A', 'B', 'B', 'C', 'D', 'D']

a.each_index do |i|
  if i < a.length - 1 && a[i+1] == a[i]
    puts "This is not the last occurrence of #{a[i]}"
  else
    puts "This is the last occurrence of #{a[i]}"
  end
end

# Output:
# This is the last occurrence of A
# This is not the last occurrence of B
# This is the last occurrence of B
# This is the last occurrence of C
# This is not the last occurrence of D
# This is the last occurrence of D

But I want to reiterate the importance of the wording in my output versus yours. This is not about whether the value is unique or not in the input. It seems to be about whether the value is the last occurrence within the input or not.

edited Jun 14, 2017 at 8:13

Stefan

115k14 gold badges157 silver badges233 bronze badges

answered Jun 14, 2017 at 8:05

user94559

60.3k6 gold badges108 silver badges107 bronze badges

2 Comments

Stefan Over a year ago

for i in 0...a.length do can be expressed as a.each_index do |i|

user94559 Over a year ago

Thanks, @Stefan. I've edited the code. I don't write much Ruby, so I definitely appreciate the pointer!

sschmeck · Accepted Answer · 2017-06-16 06:00:20Z

1

Quite similar to the answer of @GaganGami but using chunk_while.

a.chunk_while { |a,b| a == b }
 .each do |*list,last|
   list.each { |e| puts "duplicate #{e}" }
   puts "unique #{last}"
 end

chunk_whilesplits the array into sub arrays when the element changes.

['A', 'B', 'B', 'C', 'D', 'D'].chunk_while { |a,b| a == b }.to_a
# => [["A"], ["B", "B"], ["C"], ["D", "D"]]

edited Jun 16, 2017 at 6:00

answered Jun 14, 2017 at 10:50

sschmeck

7,7936 gold badges45 silver badges78 bronze badges

Comments

Cary Swoveland · Accepted Answer · 2017-06-16 16:07:03Z

The OP stated that the elements of a are sorted, but that is not required by the method I propose. It also maintains array-order, which could be important for the "do something" code performed for each element to be removed. It does so with no performance penalty over the case where the array is already sorted.

For the array

['A', 'B', 'D', 'C', 'B', 'D']

I assume that some code is to be executed for 'A', 'C' the second 'B' and the second 'D', in that order, after which a new array

['B', 'D']

is returned.

Code

def do_something(e) end

def process_last_dup(a)    
  a.dup.
    tap do |b|
      b.each_with_index.
        reverse_each.
        uniq(&:first).
        reverse_each { |_,i| do_something(a[i]) }.
        each { |_,i| b.delete_at(i) }
    end
end

Example

a = ['A', 'B', 'B', 'C', 'D', 'D']

process_last_dup(a)
  #=> ["B", "D"]

Explanation

The steps are as follows.

b = a.dup
  #=> ["A", "B", "B", "C", "D", "D"]
c = b.each_with_index
  #=> #<Enumerator: ["A", "B", "B", "C", "D", "D"]:each_with_index>
d = c.reverse_each
  #=> #<Enumerator: #<Enumerator: ["A",..., "D"]:each_with_index>:reverse_each>

Notice that d can be thought of as a "compound" enumerator. We can convert it to an array to see the elements it will generate and pass to uniq.

d.to_a
  #=> [["D", 5], ["D", 4], ["C", 3], ["B", 2], ["B", 1], ["A", 0]]

Continuing,

e = d.uniq(&:first)
  #=> [["D", 5], ["C", 3], ["B", 2], ["A", 0]]
e.reverse_each { |_,i| do_something(a[i]) }

reverse_each is used so that do_something is first executed for 'A', then for the second 'B', and so on.

e.each { |_,i| b.delete_at(i) }
b #=> ["B", "D"]

If a is to be modified in place replace a.dup. with a..

Readers may have noticed that the code I gave at the beginning used Object#tap so that tap's block variable b, which initially equals a.dup, will be returned after it has been modified within tap's block, rather than explicitly setting b = a.sup at the beginning and b at the end, as I've done in my step-by-step explanation. Both approaches yield the same result, of course.

The doc for Enumerable#uniq does not specify whether the first element is kept, but it does reference Array.uniq, which does keep the first. If there is any uneasiness about that one could always replace reverse_each with reverse so that Array.uniq would be used.

Collectives™ on Stack Overflow

processing array with duplicates

4 Answers 4

9 Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

9 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related