How to split a string with multiple patterns in perl?

Question

I want to split a string with multiple patterns:

ex.

my $string= "10:10:10, 12/1/2011";

my @string = split(/firstpattern/secondpattern/thirdpattern/, $string);

foreach(@string) {
    print "$_\n";
}

I want to have an output of:

What is the proper way to do this?

stevenl · Accepted Answer · 2011-11-24 05:56:12Z

41

Use a character class in the regex delimiter to match on a set of possible delimiters.

my $string= "10:10:10, 12/1/2011";
my @string = split /[:,\s\/]+/, $string;

foreach(@string) {
    print "$_\n";
}

Explanation

The pair of slashes /.../ denotes the regular expression or pattern to be matched.
The pair of square brackets [...] denotes the character class of the regex.
Inside is the set of possible characters that can be matched: colons :, commas ,, any type of space character \s, and forward slashes \/ (with the backslash as an escape character).
The + is needed to match on 1 or more of the character immediately preceding it, which is the entire character class in this case. Without this, the comma-space would be considered as 2 separate delimiters, giving you an additional empty string in the result.

edited Nov 24, 2011 at 5:56

answered Nov 24, 2011 at 5:30

stevenl

6,79828 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

quinekxi Over a year ago

Worked perfectly well! Thanks. Btw, do you mind to explain this code? /[:,\s\/]+/

quinekxi Over a year ago

Thank you for the additional input, that simply explains everything! :D

KingsInnerSoul Over a year ago

I know it is an old thread, but I am wondering how I should add []() to the list of delimiters? It seems to get rid of the []() when I just add it there.

stevenl Over a year ago

@KingsInnerSoul, Add a backslash in front of each of those, just like I have for the slash above

ikegami · Accepted Answer · 2011-11-24 07:11:44Z

6

Wrong tool!

my $string = "10:10:10, 12/1/2011";
my @fields = $string =~ /([0-9]+)/g;

answered Nov 24, 2011 at 7:11

ikegami

391k17 gold badges291 silver badges555 bronze badges

3 Comments

quinekxi Over a year ago

Yes, I know, I'm sorry, I didn't know there is another approach on it.

ikegami Over a year ago

@quinekxi, No need to apologise, you didn't do anything wrong. A good reply usually comes from considering the bigger picture. Questions are often too specific.

quinekxi Over a year ago

Thanks though for giving me something to think of and consider another solution.

Chris Charley · Accepted Answer · 2011-11-24 17:06:18Z

4

You can split on non-digits;

#!/usr/bin/perl
use strict;
use warnings;
use 5.014;

my $string= "10:10:10, 12/1/2011";
say for split /\D+/, $string;

answered Nov 24, 2011 at 17:06

Chris Charley

6,6972 gold badges27 silver badges28 bronze badges

Comments

Trizen · Accepted Answer · 2011-11-24 10:55:38Z

2

my $string= "10:10:10, 12/1/2011";

my @string = split(m[(?:firstpattern|secondpattern|thirdpattern)+], $string);

my @string = split(m[(?:/| |,|:)+], $string);

print join "\n", @string;

answered Nov 24, 2011 at 10:55

Trizen

235 bronze badges

5 Comments

TLP Over a year ago

/| |,|: better written as [/ ,:]

Joel Berger Over a year ago

@TLP, is it? IIRC alternations get compiled into a trie internally, does a character class? Not saying you are wrong, really a question.

TLP Over a year ago

@JoelBerger I don't know about the internals, but I think it's more readable. Here's a benchmark:

perl -wE "use Benchmark qw(cmpthese); $a=qq(10:10:10, 12/1/2011); cmpthese(100000, { Piped => sub { my @r = split (m[(?:/| |,|:)+], $a); }, Class => sub  { my @r = split (m[(?:[/ ,:])+], $a); } });"

Piped 142450/s -- -27% // Class 194175/s 36% -- Looks like character class is 36% faster.

TLP Over a year ago

Oops, didn't see that the m delimiter was brackets. Strange that it didn't complain. Well, with m##, the results go up to 45% faster.

James O'Brien Over a year ago

This answer is more general - it can be used with entire words as well

reinierpost · Accepted Answer · 2011-11-24 12:00:30Z

2

To answer your original question: you were looking for the | operator:

my $string = "10:10:10, 12/1/2011";

my @string = split(/:|,\s*|\//, $string);

foreach(@string) {
    print "$_\n";
}

But, as the other answers point out, you can often improve on that with further simplifications or generalizations.

answered Nov 24, 2011 at 12:00

reinierpost

8,6671 gold badge43 silver badges73 bronze badges

2 Comments

Brad Gilbert Over a year ago

Why are you linking to the 5.10.0 version of the page, instead of the version agnostic perldoc.perl.org/perlre.html#Metacharacters ?

reinierpost Over a year ago

@Brad Gilbert: Because that was the first one Google gave me, and I'm using 5.10 myself, and portability can potentially be an issue, and I didn't realize there was a version-agnostic version. Thanks for supplying the link.

TLP · Accepted Answer · 2011-11-24 16:14:45Z

2

If numbers are what you want, extract numbers:

my @numbers = $string =~ /\d+/g;
say for @numbers;

Capturing parentheses are not required, as specified in perlop:

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

edited Nov 24, 2011 at 16:14

answered Nov 24, 2011 at 5:43

TLP

68.2k10 gold badges97 silver badges156 bronze badges

7 Comments

Joel Berger Over a year ago

I hadn't known about the behavior you highlighted, thanks, and good for golf too!

quinekxi Over a year ago

I didn't know I could use this kind of approach. Good thinking! Thank you so much!

TLP Over a year ago

@quinekxi You're welcome. split is a very nice tool, but works best with uniform delimiters, I feel. In this case, the common element is numbers, so it's easier to work with them.

quinekxi Over a year ago

@TLP Yes, actually I used this approach but I didn't mark this as the answer just to comply on the original question. Anyway, thanks for your idea. I am glad I've got such great ideas from strangers you like.

TLP Over a year ago

@quinekxi Many of my answers are not the solutions the OPs asked for, but the one I thought they really wanted. Your question was really "How do I best extract the numbers from this string?" So that's the answer you got. :)

|

Dave Cross · Accepted Answer · 2011-11-24 10:23:43Z

1

As you're parsing something that is rather obviously a date/time, I wonder if it would make more sense to use DateTime::Format::Strptime to parse it into a DateTime object.

answered Nov 24, 2011 at 10:23

Dave Cross

69.5k3 gold badges55 silver badges101 bronze badges

Collectives™ on Stack Overflow

How to split a string with multiple patterns in perl?

7 Answers 7

4 Comments

3 Comments

Comments

5 Comments

2 Comments

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

3 Comments

Comments

5 Comments

2 Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related