How to sort and remove duplicates in an array?

Question

i have to compare two Csv files which are populated by an ecommerce. The files are always similar, except that the newer ones have a different number of items, because the catalogue changes every week.

Example of the CSV file:

sku_code, description, price, url    
001, product one, 100, www.something.com/1 
002, prouct two, 150, www.something.com/2

By comparing two files extracted on different days, i would like to produce a list of products which have been discontinued and another list of products which have been added.

My index should be the Sku_code, which is univocal inside the catalogue.

I've been using this code from stackoverflow:

#old file
f1 = IO.readlines("oldfeed.csv").map(&:chomp)
#new file
f2 = IO.readlines("newfeed.csv").map(&:chomp)

#find new products
File.open("new_products.txt","w"){ |f| f.write((f2-f1).join("\n")) }

#find old products
File.open("deleted_products.txt","w"){ |f| f.write((f1-f2).join("\n")) }

My issue

It works well, except in one case: when one of the fields after the sku_code is changed, the products is considered "new" (eg: a change of price ) even though for my needs, it's the same product.

What it the smartest way to compare only the sku_code instead of the whole row?

@Duck1337 no reason in particular! i've just begun learning ruby and i'm not familiar with the many Gems in existance. — Gareth Jax
– Gareth Jax, Commented Jul 25, 2013 at 23:36

tessi · Accepted Answer · 2013-07-26 10:56:01Z

2

No need to use a CSV library, because you are not interested in the actual values (except the sku_code). I'd put each line into a hash with sku_code as a key, compare the sku_codes, and them retrieve the values from those hashes.

#old file
f1 = IO.readlines("oldfeed.csv").map(&:chomp)
f1_hash = f1[1..-1].inject(Hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash}
#new file
f2 = IO.readlines("newfeed.csv").map(&:chomp)
f2_hash = f2[1..-1].inject(Hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash}

#find new products
new_product_keys = f2_hash.keys - f1_hash.keys
new_products = new_product_keys.map {|sku_code| f2_hash[sku_code] }

#find old products
old_product_keys = f1_hash.keys - f2_hash.keys
old_products = old_product_keys.map {|sku_code| f1_hash[sku_code] }

# write new products to file
File.open("new_products.txt","w") do |f|
  f.write "#{f2.first}\n"
  f.write new_products.join("\n")
end

#write old products to file
File.open("deleted_products.txt","w") do |f|
  f.write "#{f1.first}\n"
  f.write old_products.join("\n")
end

The first line of each csv file contains only column names. So I skipped the first line of each csv file (f1[1..-1]) and added it later when writing the new file (f.write "#{f1.first}\n").

Tested it for two imaginary csv files.

EDIT: Accidentally computed old_products using the new_product_keys, which was a typo. Thanks to those, who tried to edit my answer (but were unfortunately rejected).

edited Jul 26, 2013 at 10:56

answered Jul 25, 2013 at 14:31

tessi

13.6k3 gold badges40 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Gareth Jax Over a year ago

i've tried the code but the products files remain empty, except for the first line with the column names.

Gareth Jax Over a year ago

Possible issues: the delimiter used in the file is the pipe (|) instead of the comma. Another mistake on my part is that the SKU_CODE is alfanumeric, instead of being simply numeric (eg: UILED19X11) so probably the regex \d is not working as it should.

Gareth Jax Over a year ago

Confirmed! i've changed the regex and it works beautifully: f1_hash = f1[1..-1].inject(Hash.new) {|hash,line| hash[line[/^\w+/]] = line; hash}

Duck1337 · Accepted Answer · 2013-07-25 14:47:31Z

0

 require 'csv'
 #I'm really hungover
 DOA = 'oldfeed.csv'
 DOB = 'newfeed.csv'
 #^this is where your files are located

DOC = 'finished_product.csv'
#this little guy here is a csv file that has the unique values
#you dont need to create this file, ruby will make it for you


holder_1 = CSV.read(DOA)
holder_2 = CSV.read(DOB)
#we just put both csv files into an array
#way too early to be up
#assuming the Sku_code is the first number '001'
#holder_1[0][0] = 001
#holder_1[1][0] = 002

this should get you moving, you need two while loops and an if statement, do you need more info? Or are you okay with this?

If you want a csv file to show you your results, it would be easier to use the csv gem.

edited Jul 25, 2013 at 14:47

answered Jul 25, 2013 at 14:24

Duck1337

5244 silver badges16 bronze badges

Comments

Peter Alfvin · Accepted Answer · 2013-07-25 15:03:22Z

Assuming that you don't have a big performance concern, I think you want to strive for the least amount of code. Even if performance is an issue, I'd start with the simplest approach and refine from there based on your needs.

I think using the CSV gem is a fine idea, because it's one less thing you have to write code for. That said, here is another way to approach this problem. Note that the diff function below works on either an array or a hash and is independent of how the key is defined. It uses an array internally for the key lookup, but changing that to use a hash is straightforward.

l1a = "001, product one, 100, www.something.com/1"
l2 = "002, prouct two, 150, www.something.com/2"
l1b = "001, product one, 120, www.something.com/1"
l3 = "003, product three, 100, www.something.com/1"
l4 = "004, product four, 100, www.something.com/1"

file_old = [l1a, l2, l3]
file_new = [l1b, l2, l4]

sku = -> (record) do
  record.split(',')[0]
end

def diff(set1, set2, keyproc)
  set2_keys = set2.collect {|e| keyproc.call(e)}
  set1.reject {|e| set2_keys.include?(keyproc.call(e))}
end

puts diff(file_old, file_new, sku)
# => "003, product three, 100, www.something.com/1"
puts diff(file_new, file_old, sku)
# => "004, product four, 100, www.something.com/1"

Collectives™ on Stack Overflow

How to sort and remove duplicates in an array?

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related