0

Can someone help me with the correct use of the split function in perl

Here is my input list called @input_lines:

google.com/test
yahoo.com/test
##############
somethingelse.com/test
##############
12345

my(@first_array,@second_array,@rand_no) = split(/^\#+/, @input_lines);
0

6 Answers 6

2

I'll make a guess on what you really mean:

At first, you have probably a text input file input.txt w/following content:

 google.com/test
 yahoo.com/test
 ##############
 somethingelse.com/test
 ##############
 12345

Now, you are trying to separate records from the file, delimited by 14 '#'s. Therefore, you could just read the file with ############## as the input record separator and be done:

 ...
 my $fn = 'input.txt';             # set the file name
 open my $fh, '<', $fn or die $!;  # open the file
 $/="\n##############\n";          # set the input record separator
 my @parts = <$fh>;                # read the file record-wise
 chomp @parts;                     # remove the record separator from data
 close $fh                         # close the file
 ...

The elements of @parts now have the following content:

 $parts[0]
     google.com/test
     yahoo.com/test

 $parts[1]
     somethingelse.com/test

 $parts[2]
     12345

If you need to look for #-separators of different size, you might achieve this in a very similar way by slurping the file in one read operation and splitting at the separators afterwards:

 ...
 my $fn = 'input.txt';
 open my $fh, '<', $fn or die $!;
 undef $/;                           # remove the input record separator
 my @parts = split /\n#+\n/, <$fh>;  # read file as a block and split 
 close $fh;
 ...

with the same result.

Regards

rbo

Sign up to request clarification or add additional context in comments.

1 Comment

I like the way you used $/ to slurp the file.
1

If format of your @input_lines strings is the same, you can similar join all strings and then split it by parts. You should notice that use split /^#+/ is wrong in your case.

my $line = join ',', @input_lines;
my ($first_part, $second_part, $third_part) = split /\#+/, $line;

my @first_array  = split ',', $first_part;
my @second_array = split ',', $second_part;
my @third_array  = split ',', $third_part;

1 Comment

Thanks for the solution. All the others were correct too. You were the first to answer so picking yours.
1

split operates on strings, not arrays. Also, you cannot assign to several arrays in the same assignment: the list on the right hand side gets flattened, so the first array takes all.

Update: This code works, though:

my (@first, @second, @rand);

for my $array (\@first, \@second, \@rand) {
    my $line;
    do {
        push @$array, $line = shift @input_lines
    } until $line =~ /^#+/ or ! @input_lines;
    pop @$array if @input_lines;                 # Remove the separators
}

Comments

1

You can do something like this. There's an array ref each element of $output that represents one of your arrays.

use strict; use warnings;
use Data::Dumper;

my @input_lines = (
  'google.com/test',
  'yahoo.com/test',
  '##############',
  'somethingelse.com/test',
  '##############',
  '12345',
);

my $output = []; # array ref
my $rand_no;
my $i = 0;
foreach my $line (@input_lines) {
  if ($line =~ m/^#+$/) {
    # if it's the # we move to the next index
    $i++;
    next;
  } 
  elsif ($line =~ m/^\d+$/) {
    # this is the random numer
    $rand_no = $line;
  } else {
    # everything else goes into the current index
    push @{ $output->[$i] }, $line;
  }
} 

print Dumper $output, $rand_no;

Output:

$VAR1 = [
          [
            'google.com/test',
            'yahoo.com/test'
          ],
          [
            'somethingelse.com/test'
          ]
        ];
$VAR2 = '12345';

Comments

1

Assuming your input lines are in $string (otherwise use join "\n", @input_lines), you can use split like this:

($first, $second, $rand_no) = split /\n#+\n/m, $string;

print "`", $_, "`\n" for (@fields)'

Comments

1

See both scripts below - one of them should work for you...

Script:

my @input_lines = <main::DATA>;
my $input_string = join /\n/, @input_lines; 
my @split_lines = split(/\s*[#\n\r]+\s*/, $input_string);
print "$_\n" for @split_lines;

__DATA__
google.com/test 
yahoo.com/test 
############## 
somethingelse.com/test 
############## 
12345

Output:

google.com/test
yahoo.com/test
somethingelse.com/test
12345

See and test the code here.


Script:

 use Data::Dumper;

 my @input_lines = <main::DATA>;
 my $input_string = join /\n/, @input_lines; 
 my @blocks = split(/\s*#+\s*/, $input_string);
 my @matches = ();
 push @matches, [ split(/\s*[\n\r]+\s*/, $_) ] for @blocks;

 print Dumper(@matches);

 __DATA__
 google.com/test 
 yahoo.com/test 
 ############## 
 somethingelse.com/test 
 ############## 
 12345

Output:

 $VAR1 = [
           'google.com/test',
           'yahoo.com/test '
         ];
 $VAR2 = [
           'somethingelse.com/test '
         ];
 $VAR3 = [
           '12345'
         ];

See and test the code here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.