16
my $line = "file1.gz file2.gz file3.gz";
my @abc = split('', $line);
print "@abc\n";

Expected output:

file1.gz
file2.gz
file3.gz

I want the output to be file1.gz in $abc[0], file2.gz in $abc[1], and file3.gz in $abc[2]. How do I split $line?

2
  • 2
    Well, no programming language can read your mind. split '' splits into individual characters. If all your filenames start with file..., then split /(?=file)/ would work, but there is no general solution Commented Jun 1, 2013 at 11:50
  • 2
    @aragaer Your comment is factually wrong. split takes arguments as pattern, string, limit. Your order is wrong. And print "@abc\n" would work fine, provided that $" eq "\n" ($" is usually a space). Commented Jun 1, 2013 at 11:53

5 Answers 5

22

Splitting a string by whitespace is very simple:

print $_, "\n" for split ' ', 'file1.gz file1.gz file3.gz';

This is a special form of split actually (as this function usually takes patterns instead of strings):

As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a literal string composed of a single space character (such as ' ' or "\x20"). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.


Here's an answer for the original question (with a simple string without any whitespace):

Perhaps you want to split on .gz extension:

my $line = "file1.gzfile1.gzfile3.gz";
my @abc = split /(?<=\.gz)/, $line;
print $_, "\n" for @abc;

Here I used (?<=...) construct, which is look-behind assertion, basically making split at each point in the line preceded by .gz substring.

If you work with the fixed set of extensions, you can extend the pattern to include them all:

my $line = "file1.gzfile2.txtfile2.gzfile3.xls";
my @exts = ('txt', 'xls', 'gz');
my $patt = join '|', map { '(?<=\.' . $_ . ')' } @exts;
my @abc = split /$patt/, $line;
print $_, "\n" for @abc;
Sign up to request clarification or add additional context in comments.

2 Comments

the question has been changed to include spaces
@user2384801 Added explanation and link.
15

Having $line as it is now, you can simply split the string based on at least one whitespace separator

my @answer = split(' ', $line); # creates an @answer array

then

print("@answer\n");               # print array on one line

or

print("$_\n") for (@answer);      # print each element on one line

I prefer using () for split, print and for.

2 Comments

You should know that the default ' ' split is probably what you want instead of /\s+/. They are exactly the same, except the default strips leading whitespace before splitting.
@TLP Thanks a lot - have always used /\s+/ ignoring that default ' '. I still find /\s+/ easier to understand as it does what it shows... But I guess ' ' is easy to remember, does exactly what one want (usually nobody cares having a 0-string as first element), and is certainly optimized to do it without the expensive use of a regex. Answer updated.
2

You already have multiple answers to your question, but I would like to add another minor one here that might help to add something.

To view data structures in Perl you can use Data::Dumper. To print a string you can use say, which adds a newline character "\n" after every call instead of adding it explicitly.

I usually use \s which matches a whitespace character. If you add + it matches one or more whitespace characters. You can read more about it here perlre.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

use feature 'say';

my $line = "file1.gz file2.gz file3.gz";
my @abc  = split /\s+/, $line;

print Dumper \@abc;
say for @abc;

1 Comment

Is there a reason that people are downgrade this answer? According to the official documentation from Perl::doc: "In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/ ; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator."
2

Just use /\s+/ against '' as a splitter. In this case all "extra" blanks were removed. Usually this particular behaviour is required. So, in you case it will be:

my $line = "file1.gz file1.gz file3.gz";
my @abc = split(/\s+/, $line);

Comments

0

I found this one to be very simple!

my $line = "file1.gz file2.gz file3.gz";

my @abc =  ($line =~ /(\w+[.]\w+)/g);

print $abc[0],"\n";
print $abc[1],"\n";
print $abc[2],"\n";

output:

file1.gz 
file2.gz 
file3.gz

Here take a look at this tutorial to find more on Perl regular expression and scroll down to More matching section.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.