0

I wanted to remove white spaces from multiple datafiles through the for loop below. I inserted the for loop inside a function. The function can read the input data files, but the output files print doesn't work correctly unless I reset the variable ($new_data) after every time it prints to a file. Otherwise, the earlier data gets appended to the later data. Also, is there any problem if the same files as the input and output since I have no use of the input file later?

Pass @row for reading from inputFile and $new_data for writing to the outputFile

$dir = '/****/';

$inputFileSpring = $dir . "SpringSIMS.dat";
$inputFileSummer = $dir . "SummerSIMS.dat";
$inputFileFall = $dir . "FallSIMS.dat";

$outputFileSpring = $dir . "Spring.dat";
$outputFileSummer = $dir . "Summer.dat";
$outputFileFall = $dir . "Fall.dat";

#Read Spring SIMS Data
open (NOTE, "$inputFileSpring" || die "Could not open $inputFileSpring\n");
processFile(@row=<NOTE>);
close(NOTE);

#Write Spring Data
open(NOTE, ">$outputFileSpring" || die "Could not open $inputFileSpring\n");
print NOTE $new_data;
close(NOTE);
reset('new_data');

#Read Summer SIMS Data
open (NOTE, "$inputFileSummer" || die "Could not open $inputFileSummer\n");
processFile(@row=<NOTE>);
close(NOTE);

#Write Summer Data
open(NOTE, ">$outputFileSummer" || die "Could not open $inputFileSummer\n");
print NOTE $new_data;
close(NOTE);
reset('new_data');

#Read Fall SIMS Data
open (NOTE, "$inputFileFall" || die "Could not open $inputFileFall\n");
processFile(@row=<NOTE>);
close(NOTE);

#Write Fall Data
open(NOTE, ">$outputFileFall" || die "Could not open $inputFileFall\n");
print NOTE $new_data;
close(NOTE);
reset('new_data');


sub processFile
{
    for $row(@row) {
        chop($row);
        @field = split(/\|/, $row);
        for ($i=0; $i<@field; $i++) {
            if ($field[$i] =~ /^ /)
            {
                $field[$i] = " ";
            }
            else
            {
                $field[$i] =~ s/ *$//g;
            }
            $new_data .= $field[$i] . "|";
        }
        $lastchar = chop($new_data);
        if (@field == 15) {
            $new_data .= "|0";
        }
        $new_data .= "\n";
    }
    # return $new_data;
} # END  sub processFile

exit;

3 Answers 3

3

Wowzers. Well, your main problem is that you're using global variables. As a general rule, you should only resort to global variables when... well, never, really. And certainly not in simple cases like this.

If you use a lexical scope on your variables, and pass arguments to the subroutines, you will never notice issues like this. E.g.:

my $foo = process($bar);

sub process {
    my $arg = shift;
    my $value = ....;
    return $value;
}

Now, I can't help but notice that in each case you perform the exact same open, so why not include that into your subroutine. As a benefit, you don't need to worry about closing file handles, as they are closed automatically when they go out of scope.

Not sure what your $last_char variable is for, so I left it as a lexical. I've also done nothing about the code in your subroutine, except fix the atrocious indentation. Notable changes in your code:

  • Using strict and warnings!
  • Use a return value, returning the value of a lexically scoped variable
  • Passing arguments to subroutines
  • chop -> chomp. You basically should never use chop.
  • Using a basename list to build file names instead of repeating similar names
  • Using three argument open, with explicit mode and a lexical file handle.

Note: You should never, ever write perl code without using use strict; use warnings;. There is no benefit from not using them: You'll only spend more time trying to find simple errors.

Note#2: Untested code

use strict;
use warnings;

my @seasons = ("Spring", "Summer", "Fall");

for my $season (@seasons) {
    my $input  = $season . "SIMS.dat";
    my $output = $season . ".dat";
    output_data($input, $output);
}

sub processFile {
     my $file = shift;
     open my $fh, '<', $file or die "$file: $!";
     while (my $row = <$fh>) {
         chomp $row;  # NOTE: never use chop, use chomp instead
         my @field = split(/\|/, $row);
         for (my $i=0; $i<@field; $i++){
             if ($field[$i] =~ /^ /) {
                 $field[$i] = " ";
             } else {
                 $field[$i] =~ s/ *$//g;
             }
         }
         my $new_data = join "|", @field;
         if(@field == 15) {
             $new_data .= "|0";
         }
         $new_data .= "\n";
     }
     return $new_data;
}

sub output_data {
    my ($input, $output) = @_;
    open my $fh, '>', $output or die "$output: $!";
    print $fh processFile($input);
}

ETA: Looking at your subroutine code now, the following optimizations occur to me:

$new_data .= $field[$i] . "|";
....
my $lastchar = chop($new_data);

No. Instead use join:

$new_data = join "|", @field;

This part:

 if ($field[$i] =~ /^ /) {
     $field[$i] = " ";
 } else {
     $field[$i] =~ s/ *$//g;
 }

...will either change the first field to a single space " ", if the first character is a space, or it will strip spaces from the end of the string. Is this really what you want? I.e. " foo" will be changed to " " (space).

I would imagine you're after something like:

$field[$i] =~ s/^ *//;
$field[$i] =~ s/ *$//;

In which case you can simply do:

for (@field) {
    s/^ *//;
    s/ *$//;
}

Which works as intended because $_ is aliased to each element in the array, and they will be altered by the substitution regex. A more verbose solution:

for my $value (@field) {
    $value =~ s/^ *//;
    $value =~ s/ *$//;
}

Or, better yet, you can include this in your split statement:

my $new_data = join "|", split /\s*\|\s*/, $row;
$new_data =~ s/^ *//;
$new_data =~ s/ *$//;

Or use a regex, which will probably be less expensive:

$row =~ s/\s*\|\s*/|/g;
$row =~ s/^ *//;
$row =~ s/ *$//;
my $new_data = $row;
Sign up to request clarification or add additional context in comments.

12 Comments

@zaman Ah yes, it needs to be declared at the start of the sub. Oh well, I guess you solved it. Good thing you had strict and warnings turned on eh? Otherwise, you would not have caught the mistake that I made. ;) This actually makes for an excellent example of why one should always use strict and warnings.
TLP thanks so much for your optimal solution and detailed explanation. I ran your modified version with little twick on $new_data since it was complaining for "Global symbol "$new_data" requires explicit package name at testTally.pl line 34." therefore, i declared it at the beginning as my $new_data = ""; Now the problem is when I run the program it only writes the last line of the input line to the output file meaning a single line entry. Do you have any clue why it might be behaving that way? Thanks again.
TLP when I run the program it only writes the last line of the input line to the output file meaning a single line entry. Do you have any clue why it might be behaving that way? Thanks again.
@zaman You put the declaration in the wrong place. Put my $new_data; right below my $file = shift;.
TLP thannks again for your response. That didn't resolve the issue. if you want i can post some sample data. Also, what does shift do? i tried to understad from the google but didn't quite get it.
|
0

Because you're using the $new_data variable as a global variable, and all your assignments are append assignments, this causes your data to build up without clearing it.

So to deal with this without calling reset every time, you can put the clearing inside the subroutine, so every time you call the subroutine, it will clear it automatically.

sub processFile 
 {
  $new_data = ""; #this should empty it out
for $row(@row) {
        chop($row);
        @field = split(/\|/, $row);
        for ($i=0; $i<@field; $i++){
            if ($field[$i] =~ /^ /) 
             {
                 $field[$i] = " ";
             }            
            else 
             {
                $field[$i] =~ s/ *$//g;
             }
            $new_data .= $field[$i] . "|";
        }
        $lastchar = chop($new_data);
        if(@field == 15) {
                $new_data .= "|0";
        }
        $new_data .= "\n";
}
# return $new_data;
} # END  sub processFile

This way, it should clear it whenever you run the function, so you won't have to do it manually.

Comments

0

Any use of Perl's reset function is archaic, but your use is also incorrect. If you want to erase the contents of $new_data, just say one of

$new_data = '';
$new_data = undef;
undef $new_data;

What reset('new_data') actually does is delete ALL symbols(*) that begin with the letters 'a','d','e','n','t','w', or '_'.

(*) - it wouldn't delete lexical variables, but this script doesn't have any of those.

1 Comment

...or, more correctly, just say "Don't use reset() when lexical variables are such a superior solution in every way possible".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.