How to get regex to work in a perl script?

Question

I am working on a Linux based Debian environment (precisely a Proxmox server) and I am writing a perl script.

My problem is : I have a folder with some files in it, every files in this folder have a number as a name (example : 100, 501, 102...). The lowest number possible is 100 and there is no limit for the greatest.

I want my script to only return files whose name is between 100 and 500. So, I write this :

system(ls /the/path/to/my/files | grep -E "^[1-4][0-9]{2}|5[0]{2}");

I think my regex and the command are good because when I type this into a terminal, this is working. But soon as I execute my script, I have those errors messages :

String found where operator expected at backupsrvproxmox.pl line 3, near "E "^[1-4][0-9]{2}|5[0]{2}""
    (Do you need to predeclare E?)
Unknown regexp modifier "/b" at backupsrvproxmox.pl line 3, at end of line
syntax error at backupsrvproxmox.pl line 3, near "E "^[1-4][0-9]{2}|5[0]{2}""
Execution of backupsrvproxmox.pl aborted due to compilation errors.

I also tried with egrep but still not working.

I don't understand why the error message is about the /b modifier since I only use integer and no string.

So, any help would be good !

Do your file names begin with the number you are looking for? E.g. 100_bears_in_the_woods.txt or 5000000_ways_to_cook_lizards.doc? — TLP
– TLP, Commented Jan 7, 2022 at 13:03

zdim · Accepted Answer · 2022-05-04 16:24:39Z

4

Instead of using system tools via system can very nicely do it all in your program

my @files = grep { 
    my ($n) = m{.*/([0-9]+)};                        #/ 
    defined $n and $n >= 100 and $n <= 500;
}
glob "/the/path/to/my/files/*"

This assumes that numbers in file names are at the beginning of the filename, picked up from the quesiton, so the subpattern for the filename itself directly follows a /. ^† (That "comment" #/ on the right is there merely to turn off wrong and confusing syntax highlighting in the editor.)

The command you tried didn't work because of the wrong syntax, since system takes either a string or a list of strings while you give it a bunch of "bareword"s, what confused the interpreter to emit a complex error message (most of the time perl's error messages are right to the point).

But there is no need to suffer through syntax details, which can get rather involved for this, nor with shell invocations which are complex and messy (under the hood), and inefficient.

^† It also assumes that the files are not in the current directory -- clearly, since a path is passed to glob (and not just * for files in the current directory), which returns the filename with the path, and which is why we need the .*/ to greedily get to the last / before matching the filename.

But if we are in the current directory that won't work since there wouldd be no / in the filename. To include this possibility the regex need be modified, for example like

my ($n) = m{ (?: .*/ | ^) ([0-9]+)}x;

This matches filenames beginning with a number, either after the last slash in the path (with .*/ subpattern) or at the beginning of the string (with ^ anchor).

The modifier /x makes it discard literal spaces in the pattern so we can use them freely (along with newlines and # for comments!) to make that mess presumably more readable. Then I also use {} for delimiters so to not have to escape the / in the pattern (and with any delimiters other than // we must have that m).

edited May 4, 2022 at 16:24

answered Jan 7, 2022 at 10:25

zdim

67.2k5 gold badges59 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dave Mitchell Over a year ago

the strings returned by glob include the pathname, so that regex won't match as-is.

zdim Over a year ago

@DaveMitchell Indeed -- thank you very much for the comment! (Curiously, I did test but in a one-liner and with files in the current directory so it never bombed me.) Fixed, and what then triggered addition of an explanation which I felt was suitable here...

TLP Over a year ago

Maybe you are using the wrong tools here. File::Basename? File::Find?

Shawn · Accepted Answer · 2022-01-07 10:48:37Z

Using a regular expression to try to match a range of numbers is just a pain. And this is perl; no need to shell out to external programs to get a list of files (Generally also a bad idea in shell scripts; see Why you shouldn't parse the output of ls(1))!

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;

sub getfiles {
    my $directory = shift;
    opendir my $dir, $directory or die "Unable to open $directory: $!";
    my @files =
        grep { /^\d+$/ && $_ >= 100 && $_ <= 500 } readdir $dir;
    closedir $dir;
    return @files;
}

my @files = getfiles '/the/path/to/my/files/';
say "@files";

Or using the useful Path::Tiny module:

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use Path::Tiny;

# Returns a list of Path::Tiny objects, not just names.
sub getfiles {
    my $dir = path($_[0]);
    return grep { $_ >= 100 && $_ <= 500 } $dir->children(qr/^\d+$/);
}

my @files = getfiles '/the/path/to/my/files/';
say "@files";

Collectives™ on Stack Overflow

How to get regex to work in a perl script?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related