1

i am dealing a with a csv file which is full of lines like these

   "ATSMM","CCC","43 676 017111 / 017113"
   "AERCM","XXX","54 320071 0900 / 0999"

and at the moment the script im using is able to recognize the number before the "/" and the number after it.

  my ($data, $start, $end) = m|(.* )(\d+) / (\d+)|;

the code above saves in $data the text before the number in front of the "/" the number before the / in $start and the number after in $end. My plans to sort this was to put each number which is in $start in an arrary1 and each number which is stored in $end in array2. This is aimed to seperate each number digit by digit so i can list down a range.

i am using perl scripts but any other options are welcome.

Thanks !!

3
  • 7
    Sounds like a classic XY-problem. If you need to parse csv data with quoted fields, maybe you should look at a csv parser, like Text::CSV. Commented Mar 17, 2016 at 13:49
  • I don't understand what your problem is. Are you looking for push @starts, $start; push @ends, $end or am I completely misunderstanding the problem statement? Commented Mar 17, 2016 at 14:02
  • You can collect the pair as a record in an array of references, and still sort the array using a field in the record. It's tricky looking syntactically, but easily done. Basically a push @ary, [ $2, $3 ]; Commented Mar 17, 2016 at 15:14

3 Answers 3

2

You want:

 my ( @array1, @array2 );    
 while (...) {
     my ($start, $end) = ...;
     push @array1, $start;
     push @array2, $end;
 }

 my @sorted_indexes = sort {
    $array1[$a] <=> $array1[$b]
       ||
    $array2[$a] <=> $array2[$b]
 } 0..$#array1;
 @array1 = @array1[@sorted_indexes];
 @array2 = @array2[@sorted_indexes];

With a proper CSV parser:

 use Text::CSV_XS qw( );

 my ( @array1, @array2 );    
 my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
 while (my $row = $csv->getline($fh)) {
     my ($start, $end) = $row->[2] =~ /(\d+) \/ (\d+)/
        or die("Bad data");

     push @array1, $start;
     push @array2, $end;
 }

By the way, many avoid parallel arrays and would prefer

 my @array;    
 while (...) {
     my ($start, $end) = ...;
     push @array, [ $start, $end ];
 }

 @array = sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @array;
Sign up to request clarification or add additional context in comments.

Comments

0

Probably you are looking for this:

my ($start, undef, $end) = m{"((\d|\s)+) / (\d+)"};

It is more specific on the "-s, so it finds the third item in your csv line.
It also uses { and } for quoting, because | is used for specifying alternatives in the pattern.
There are three submatches in the pattern (three pairs of (), two nested), but we use the second (the nested one) only to apply the quantifier + to the alternatives (digits or spaces), so we discard that submatch result with assigning it to undef on the left.

Comments

0

You can collect the pair as a record in an array of references, and still sort the array using a field in the record. It's tricky looking syntactically, but easily done. Basically a push @ary, [ $2, $3 ];

Then to sort the records by a particular field you could do this:

use strict;
use warnings;

$/ = "\n";

my $line;
my @records = ();

while ( $line = <DATA> ) {
  if ($line =~ m|(.* )(\d+) / (\d+)|) {
    push @records, [$2, $3];
  }
}

sub getSortedRecords {
  my ( $field, $direction ) = @_;
  return sort {
      my($r1, $r2) = ($a, $b);
      ($r1, $r2) = ($b, $a) if $direction cmp 'descending';
      $$r1[ $field ] <=> $$r2[ $field ];
  } @records;
}

print "Ascending, fld 0\n";
for ( getSortedRecords( 0, 'ascending' ) ) {
  print $$_[0], " / ", $$_[1], "\n";
}

print "\nAscending, fld 1\n";
for ( getSortedRecords( 1, 'ascending' ) ) {
  print $$_[0], " / ", $$_[1], "\n";
}

print "\nDescending, fld 0\n";
for ( getSortedRecords( 0, 'descending' ) ) {
  print $$_[0], " / ", $$_[1], "\n";
}

print "\nDescending, fld 1\n";
for ( getSortedRecords( 1, 'descending' ) ) {
  print $$_[0], " / ", $$_[1], "\n";
}


__DATA__
"ATSMM","CCC","43 676 1 / 4"
"AERCM","XXX","54 320071 2 / 3"
"ATSMM","CCC","43 676 3 / 2"
"AERCM","XXX","54 320071 4 / 1"

Out >>

Ascending, fld 0
4 / 1
3 / 2
2 / 3
1 / 4

Ascending, fld 1
1 / 4
2 / 3
3 / 2
4 / 1

Descending, fld 0
1 / 4
2 / 3
3 / 2
4 / 1

Descending, fld 1
4 / 1
3 / 2
2 / 3
1 / 4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.