2

I'd like to sort the first array:

 filenames = ["z.pdf", "z.txt", "a.pdf", "z.rf", "a.rf","a.txt", "z.html", "a.html"]

by the following file's extensions array:

 extensions = ["html", "txt", "pdf", "rf"]

using sort_by. But when I try:

 filenames.sort_by { |x| extensions.index x.split('.')[1] }

I get:

 ["a.html", "z.html", "z.txt", "a.txt", "a.pdf", "z.pdf", "z.rf", "a.rf"]

The filenames with extensions "txt" and "rf" are not sorted. I've tried to figure out how sort_by sorts by using a tuple but haven't been able to find the source code for sort_by.

How can I sort one array by another array using sort_by?


Edit:

The result should look like:

["a.html", "z.html", "a.txt", "z.txt", "a.pdf", "z.pdf", "a.rf", "z.rf"]
1
  • "please use Ruby's built in File class for this"? You mean like my answer? Commented May 17, 2013 at 20:19

7 Answers 7

3

Sort by the index of the extensions array, then the filename:

filenames = ["z.pdf", "z.txt", "a.pdf", "z.rf", "a.rf","a.txt", "z.html", "a.html"]
extensions = ["html", "txt", "pdf", "rf"]

p sorted = filenames.sort_by{|fn| [extensions.index(File.extname(fn)[1..-1]), fn]} #[1..-1] chops off the dot
#=> ["a.html", "z.html", "a.txt", "z.txt", "a.pdf", "z.pdf", "a.rf", "z.rf"]
Sign up to request clarification or add additional context in comments.

1 Comment

The OP appears to want to sort by extension-list major, filename minor.
3
sorted = filenames.sort_by do |filename|
  extension = File.extname(filename).gsub(/^\./, '')
  [
    extensions.index(extension) || -1,
    filename,
 ]
end
p sorted
# => ["a.html", "z.html", "a.txt", "z.txt", "a.pdf", "z.pdf", "a.rf", "z.rf"]

This uses the fact that the sort order of arrays is determined by the sort order of their elements, in the order they are defined. That means that if sort_by returns an array, the first element of the array is the primary sort order, the second element is the secondary sort order, and so on. We exploit that to sort by extension major, filename minor.

If an extension is not in the list, this code puts it first by virtue of ||= -1. To put an unknown extension last, replace -1 with extensions.size.

1 Comment

File.extname, that's it!
3

How about:

>> filenames.sort.group_by{ |s| File.extname(s)[1..-1] }.values_at(*extensions).flatten
[
    [0] "a.html",
    [1] "z.html",
    [2] "a.txt",
    [3] "z.txt",
    [4] "a.pdf",
    [5] "z.pdf",
    [6] "a.rf",
    [7] "z.rf"
]

group_by comes from Enumerable, and is a nice tool in our collection toolbox, letting us group things by "like" attributes. In this case, it's grouping on the file's extension, retrieved using File.extname, minus its leading '.'.

It's important to understand why File.extname is important. A file can have multiple sections delimited by '.', for various reasons. Simply using split('.') is a recipe for disaster at that point, because code following the split will have to deal with more than two strings. Other files don't contain a delimiting '.' at all. File.extname makes a reasonable attempt to retrieve the last extension found in the name, so it is a bit more sane way of dealing with file names and extensions. From the documentation:

File.extname("test.rb")         #=> ".rb"
File.extname("a/b/d/test.rb")   #=> ".rb"
File.extname("foo.")            #=> ""
File.extname("test")            #=> ""
File.extname(".profile")        #=> ""
File.extname(".profile.sh")     #=> ".sh"

values_at comes from Hash, and extracts the values from a hash, in the order of the keys/parameters passed in. It's great for this sort of situation because we can force the order of the values to match the order of keys. When you have a huge hash and want to cherry-pick certain values from it in one action, values_at is the tool to grab. If you need your "by-extensions" order to be different, change extensions and the output will automagically reflect that as a result of values_at.

7 Comments

This discards any file which has an extension not in the list, but if that's alright, it's very pretty and easy to understand.
Yeah, File.extname, that's it! I'm basically a noob anyway, I just learned it today, in the process of being here.
This is a cool answer, but it doesn't use sort_by. Can the answer to be modified to use sort_by?
Here's my question back: why use sort_by when it will result in a slower execution? It's important to know when to use a tool and why to use it and there is no benefit using sort_by with this algorithm because it has added complexity and overhead.
Can you tell me what is the expected runtime of your algorithm vs using sort_by?
|
1
filenames.sort_by{|f| f.split(".").map{|base, ext|
  [extensions.index(ext), base]
}}

2 Comments

This looks like it sorts on the filename first and then the extension giving the result - ["a.html", "a.pdf", "a.rf", "a.txt", "z.html", "z.pdf", "z.rf", "z.txt"] - which isn't what I want.
On the other hand flipping ext and base works, as in - filenames.sort_by{|f| f.split(".").map{|base, ext| [extensions.index(base), ext] }}. Not completely sure why.
0
extensions = [".html", ".txt", ".pdf", ".rf"]
filenames.sort_by { |file_name_string|
  [ extensions.index( File.extname file_name_string ), file_name_string ]
}

3 Comments

I tried this and I got it sorting on the filename first and then the extension - ["a.html", "a.pdf", "a.rf", "a.txt", "z.html", "z.pdf", "z.rf", "z.txt"] which isn't the result I want.
That's enormously interesting, my machine gives output ["a.html", "z.html", "a.txt", "z.txt", "a.pdf", "z.pdf", "a.rf", "z.rf"]... To get the output you indicated, I have to swap the orded of elements in the array inside sort_by block body...
I'm running rails c on a Mac Pro and I still get the same answer. I just cut and pasted your answer into the rails console. Strange because your answer is similar the other answers that work.
0
filenames = ["z.pdf", "z.txt", "a.pdf", "z.rf", "a.rf","a.txt", "z.html", "a.html"]
extensions = ["html", "txt", "pdf", "rf"]
extensions.each_with_object([]){|k,ob| ob << filenames.find_all {|i| File.extname(i)[1..-1] == k }.sort}.flatten
#=> ["a.html", "z.html", "a.txt", "z.txt", "a.pdf", "z.pdf", "a.rf", "z.rf"]

3 Comments

I added Edit 1 to show what the result should look like. I'm not asking to sort the filenames on their extensions alphabetically. I want to sort the filenames in the order of the extensions array.
1 - This result has the reverse sort of the filenames - ["z.html", "a.html", "z.txt", "a.txt", "z.pdf", "a.pdf", "z.rf", "a.rf"]. 2 - I would like the answer to use sort_by.
The question was to use "sort_by".
-1

There is no need to use the File class. Just the light and simple regex.

filenames.sort_by{|i| i.scan(/\..+$/)[0]}

1 Comment

This sorts by the extension itself, not by the extension's position inside extensions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.