Array#sort_by clearly is the right method, but here's a reminder of how Array#sort would be used here:
a.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end.map {|e| (e.is_a? Array) ? e.join(', ') : e }
#=> ["apple", "grapes, berries", "peach", "pear"]
@theTinMan pointed out that sort is quite a bit slower than sort_by here, and gave a reference that explains why. I've been meaning to see how the Benchmark module is used, so took the opportunity to compare the two methods for the problem at hand. I used @Rafa's solution for sort_by and mine for sort.
For testing, I constructed an array of 100 random samples (each with 10,000 random elements to be sorted) in advance, so the benchmarks would not include the time needed to construct the samples (which was not insignificant). 8,000 of the 10,000 elements were random strings of 8 lowercase letters. The other 2,000 elements were 2-tuples of the form [str1, str2], where str1 and str2 were each random strings of 8 lowercase letters. I benchmarked with other parameters, but the bottom-line results did not vary significantly.
require 'benchmark'
# n: total number of items to sort
# m: number of two-tuples [str1, str2] among n items to sort
# n-m: number of strings among n items to sort
# k: length of each string in samples
# s: number of sorts to perform when benchmarking
def make_samples(n, m, k, s)
s.times.with_object([]) { |_, a| a << test_array(n,m,k) }
end
def test_array(n,m,k)
a = ('a'..'z').to_a
r = []
(n-m).times { r << a.sample(k).join }
m.times { r << [a.sample(k).join, a.sample(k).join] }
r.shuffle!
end
# Here's what the samples look like:
make_samples(6,2,4,4)
#=> [["bloj", "izlh", "tebz", ["lfzx", "rxko"], ["ljnv", "tpze"], "ryel"],
# ["jyoh", "ixmt", "opnv", "qdtk", ["jsve", "itjw"], ["pnog", "fkdr"]],
# ["sxme", ["emqo", "cawq"], "kbsl", "xgwk", "kanj", ["cylb", "kgpx"]],
# [["rdah", "ohgq"], "bnup", ["ytlr", "czmo"], "yxqa", "yrmh", "mzin"]]
n = 10000 # total number of items to sort
m = 2000 # number of two-tuples [str1, str2] (n-m strings)
k = 8 # length of each string
s = 100 # number of sorts to perform
samples = make_samples(n,m,k,s)
Benchmark.bm('sort_by'.size) do |bm|
bm.report 'sort_by' do
samples.each do |s|
s.sort_by { |f| f.class == Array ? f.first : f }
end
end
bm.report 'sort' do
samples.each do |s|
s.sort do |s1,s2|
t1 = (s1.is_a? Array) ? s1.first : s1
t2 = (s2.is_a? Array) ? s2.first : s2
t1 <=> t2
end
end
end
end
user system total real
sort_by 1.360000 0.000000 1.360000 ( 1.364781)
sort 4.050000 0.010000 4.060000 ( 4.057673)
Though it was never in doubt, @theTinMan was right! I did a few other runs with different parameters, but sort_by consistently thumped sort by similar performance ratios.
Note the "system" time is zero for sort_by. In other runs it was sometimes zero for sort. The values were always zero or 0.010000, leading me to wonder what's going on there. (I ran these on a Mac.)
For readers unfamiliar with Benchmark, Benchmark#bm takes an argument that equals the amount of left-padding desired for the header row (user system...). bm.report takes a row label as an argument.
["grapes", "berries"]does come afterApple?I'd like to sort in alphabetical order, with the first item of a string array being used in comparison to the other stringsinto your question. One has to read the comments to understand your desired sorting scheme.