1

I have an array of extensions and an array of file names:

exts = ['.zip', '.tgz', '.sql']
files = ['file1.txt', 'file2.doc', 'file2.tgz', 'file3.sql', 'file6.foo', 'file4.zip']

I want to filter the file names by one or more matching extensions. In this case, the result would be:

["file1.zip", "file2.tgz", "file3.sql", "file4.zip"]

I know I can do this with a nested loop:

exts.each_with_object([]) do |ext, arr|
    files.each do |file| 
        arr << entry if file.include?(ext)
    end
end

This feels ugly to me. With select, I can avoid the feeling of nested loops:

files.select { |file| exts.each { |ext| file.include?(ext) } }

This works and feels better. Is there still a more elegant way that I'm missing?

3
  • Where do the entries in files come from? If you are filtering files in a directory, Dir.glob could be a better approach. Commented Nov 11, 2022 at 10:15
  • Thanks, they do come from a glob, and that’s the approach I use – this is just an example. Commented Nov 11, 2022 at 12:41
  • FWIW, it’s a helpful reminder that a matching via a glob would probably be a better approach for this specific case. I probably could have used a more generic example to suit the question. Commented Nov 11, 2022 at 13:08

3 Answers 3

2

I would use Enumerable#grep with a regexp like this:

exts = ['.zip', '.tgz', '.sql']
files = ['file1.txt', 'file2.doc', 'file2.tgz', 'file3.sql', 'file6.foo', 'file4.zip']

files.grep(/#{Regexp.union(exts)}$/)
#=> ["file2.tgz", "file3.sql", "file4.zip"]

I use Regexp.union instead of a simple exts.join('|') because exts include dots (.) which have a special meaning in regular expressions. Regexp.union escapes those dots automatically.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! It’s funny you mention this – I had added grep as a potential path I could take, but I missed the regex piece here. I think this is really clean.
0

Thinking about it a bit further, I realized I could make the select better if I changed the logic slightly:

exts = ['zip', 'tgz', 'sql']
files.select{ |file| exts.include?(file) }

As far as I can tell, this is as clean as I can get.

2 Comments

I think it would be more clear if you wrote exts = ['zip', 'tgz', 'sql']. Then files.select { |f| exts.any? { |ext| f.end_with?(ext) } } or files.select { |f| exts.any? { |ext| File.extname(f) == ext } }.
Thanks, I did mean files. I agree that adding the extensions array to this example would make it more clear. any? I knew, but end_with? was new to me; thanks for that. This suggestion is very readable. I think you ought to add this as a possibility answer.
0

If you store extensions in a set, you'll reduce the runtime complexity from O(NM) to O(N).

exts = ['.zip', '.tgz', '.sql'].to_set
files.select { |file| exts.include?(File.extname(file)) }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.