18

I want to split a string with multiple patterns:

ex.

my $string= "10:10:10, 12/1/2011";

my @string = split(/firstpattern/secondpattern/thirdpattern/, $string);

foreach(@string) {
    print "$_\n";
}

I want to have an output of:

10
10
10
12
 1
2011

What is the proper way to do this?

7 Answers 7

41

Use a character class in the regex delimiter to match on a set of possible delimiters.

my $string= "10:10:10, 12/1/2011";
my @string = split /[:,\s\/]+/, $string;

foreach(@string) {
    print "$_\n";
}

Explanation

  • The pair of slashes /.../ denotes the regular expression or pattern to be matched.

  • The pair of square brackets [...] denotes the character class of the regex.

  • Inside is the set of possible characters that can be matched: colons :, commas ,, any type of space character \s, and forward slashes \/ (with the backslash as an escape character).

  • The + is needed to match on 1 or more of the character immediately preceding it, which is the entire character class in this case. Without this, the comma-space would be considered as 2 separate delimiters, giving you an additional empty string in the result.

Sign up to request clarification or add additional context in comments.

4 Comments

Worked perfectly well! Thanks. Btw, do you mind to explain this code? /[:,\s\/]+/
Thank you for the additional input, that simply explains everything! :D
I know it is an old thread, but I am wondering how I should add []() to the list of delimiters? It seems to get rid of the []() when I just add it there.
@KingsInnerSoul, Add a backslash in front of each of those, just like I have for the slash above
6

Wrong tool!

my $string = "10:10:10, 12/1/2011";
my @fields = $string =~ /([0-9]+)/g;

3 Comments

Yes, I know, I'm sorry, I didn't know there is another approach on it.
@quinekxi, No need to apologise, you didn't do anything wrong. A good reply usually comes from considering the bigger picture. Questions are often too specific.
Thanks though for giving me something to think of and consider another solution.
4

You can split on non-digits;

#!/usr/bin/perl
use strict;
use warnings;
use 5.014;

my $string= "10:10:10, 12/1/2011";
say for split /\D+/, $string;

Comments

2
my $string= "10:10:10, 12/1/2011";

my @string = split(m[(?:firstpattern|secondpattern|thirdpattern)+], $string);

my @string = split(m[(?:/| |,|:)+], $string);

print join "\n", @string;

5 Comments

/| |,|: better written as [/ ,:]
@TLP, is it? IIRC alternations get compiled into a trie internally, does a character class? Not saying you are wrong, really a question.
@JoelBerger I don't know about the internals, but I think it's more readable. Here's a benchmark: perl -wE "use Benchmark qw(cmpthese); $a=qq(10:10:10, 12/1/2011); cmpthese(100000, { Piped => sub { my @r = split (m[(?:/| |,|:)+], $a); }, Class => sub { my @r = split (m[(?:[/ ,:])+], $a); } });" Piped 142450/s -- -27% // Class 194175/s 36% -- Looks like character class is 36% faster.
Oops, didn't see that the m delimiter was brackets. Strange that it didn't complain. Well, with m##, the results go up to 45% faster.
This answer is more general - it can be used with entire words as well
2

To answer your original question: you were looking for the | operator:

my $string = "10:10:10, 12/1/2011";

my @string = split(/:|,\s*|\//, $string);

foreach(@string) {
    print "$_\n";
}

But, as the other answers point out, you can often improve on that with further simplifications or generalizations.

2 Comments

Why are you linking to the 5.10.0 version of the page, instead of the version agnostic perldoc.perl.org/perlre.html#Metacharacters ?
@Brad Gilbert: Because that was the first one Google gave me, and I'm using 5.10 myself, and portability can potentially be an issue, and I didn't realize there was a version-agnostic version. Thanks for supplying the link.
2

If numbers are what you want, extract numbers:

my @numbers = $string =~ /\d+/g;
say for @numbers;

Capturing parentheses are not required, as specified in perlop:

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

7 Comments

I hadn't known about the behavior you highlighted, thanks, and good for golf too!
I didn't know I could use this kind of approach. Good thinking! Thank you so much!
@quinekxi You're welcome. split is a very nice tool, but works best with uniform delimiters, I feel. In this case, the common element is numbers, so it's easier to work with them.
@TLP Yes, actually I used this approach but I didn't mark this as the answer just to comply on the original question. Anyway, thanks for your idea. I am glad I've got such great ideas from strangers you like.
@quinekxi Many of my answers are not the solutions the OPs asked for, but the one I thought they really wanted. Your question was really "How do I best extract the numbers from this string?" So that's the answer you got. :)
|
1

As you're parsing something that is rather obviously a date/time, I wonder if it would make more sense to use DateTime::Format::Strptime to parse it into a DateTime object.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.