The regex engine is perfectly happy to operate on strings of bytes (though using \d and such may not make any sense), so your approach is perfectly fine. But white quite efficient, it can be sped up.
What if we used chr on the bytes to strip rather than using ord on all the characters read?
my @to_strip = ( 5, 11, 12, 13, 21, 64, 91, 92, 98, 107 );
my %to_strip = map { chr($_) => 1 } @to_strip;
$data =~ s/(.)/ $strip{$1} ? "" :$1 /ge;
What if we took it a step further, and made the replacement choice even sooner?
my @to_strip = ( 5, 11, 12, 13, 21, 64, 91, 92, 98, 107 );
my %to_strip = map { chr($_) => 1 } @to_strip;
my %map = map { $to_strip{$_} ? "" : $_ } map chr, 0x00..0xFF;
$data =~ s/(.)/$map{$1}/sg;
But we're still doing a lot of needless replacements. What if we search for the specific character we want to replace?
my @to_strip = ( 5, 11, 12, 13, 21, 64, 91, 92, 98, 107 );
my $pat = "[" . quotemeta( pack( 'C*', @to_strip ) ) . "]+";
my $re = qr/$pat/;
$data =~ s/$re//g;
This one is much faster for three reasons:
- As previously mentioned, we greatly reduced the number of matches, which reduces the number of times the replacement expression needs to be evaluated and concatenated.
- The regex engine can check for matching characters far faster than our Perl code can.
- We eliminated the need for captures, which are (relatively speaking) quite slow.
Remember that @to_strip, %to_strip, %map, $pat and $re only need to be calculated once, not once per read. When I talked about speed above, I wasn't including the time needed to calculate these, since I assumed you will be doing multiple reads and replaces.
That said, if it's reasonable to hardcode the bytes to remove, tr///d will give you the best performance.
$data =~ tr/\x05\x0B-\x0D\x15\x40\x5B\x5C\x62\x6B//d;
It's not effective to use tr/// from a dynamic list because tr/// doesn't interpolate. We have to resort to building a sub, and invoking a sub is relatively slow.
my @to_strip = ( 5, 11, 12, 13, 21, 64, 91, 92, 98, 107 );
my $class = quotemeta( pack( 'C*', @to_strip ) );
my $inline_stripper = eval("sub { $_[0] =~ tr/$class//d; }");
$inline_stripper->($data);
The following is an efficient (but surely not as efficient) non-regex approach.
my @to_strip = ( 5, 11, 12, 13, 21, 64, 91, 92, 98, 107 );
my @to_strip_lookup; $to_strip_lookup[$_] = 1 for @to_strip;
$data = pack 'C*', grep !$to_strip_lookup[$_], unpack 'C*', $data