2

This is my perl script at the moment:

#!/usr/bin/perl
use open qw/:std :utf8/;
use strict;
use warnings;

if (defined $ARGV[0]){
my $filename = $ARGV[0];
my %count;

open (my $fh, $filename) or die "Can't open '$filename' $!";
while (<$fh>)
{
        $count{ lc $1 }++ while /(\w+)/g;
}
close $fh;

my $array = 0;

foreach my $word ( sort { $count{$b} <=> $count{$a} } keys %count)
{
    print "$count{$word} $word\n" if $array++ < 10;
}

}else{
print "Please enter the name of the file: ";
my $filename = ($_ = <STDIN>);

my %count;

open (my $fh, $filename) or die "Can't open '$filename' $!";
while (<$fh>)
{
        $count{ lc $1 }++ while /(\w+)/g;
}
close $fh;

my $array = 0;

foreach my $word ( sort { $count{$b} <=> $count{$a} } keys %count)
{
    print "$count{$word} $word\n" if $array++ < 10;
}
}

And this is my Python script at the moment:

#!/usr/bin/env python3
import os

perlscript = "perl " + " perlscript.pl " + " /home/user/Desktop/data/*.txt " + " >> " + "/home/user/Desktop/results/output.txt"
os.system(perlscript)

Problem: When there are multiple txt-files in the data folder the script only runs on one file and ignores all the other txt-files. Is there a way to run the perlscript on all the txt-files at once?

Another problem: I'm also trying to delete the txt-files with the os.remove after they have been executed but they get deleted before the perlscript has a chance to execute.

Any ideas? :)

3
  • in Pythin use glob.glob() or os.listdir() to get all files in directory and run perl script in for loop with every file separatelly. Or you have to create loop in perl script which will get next filename(s) from ARGV[1], ARGV[2], etc. As I remeber perl can use shift to get next element from ARGV. Commented May 12, 2019 at 22:40
  • When on Unix or Linux try to change if (defined $ARGV[0]){ my $filename = $ARGV[0]; to foreach my $filename (@ARGV) {. Commented May 12, 2019 at 22:52
  • @furas Thank you! It worked with glob! Commented May 12, 2019 at 23:44

1 Answer 1

3

That Perl script processes one file. Also, that string passed to shell via os.system doesn't get expanded into a valid command with a file list as intended with the * shell glob.

Instead, build the file list in Python, using os.listdir (or os.scandir) or os.walk, or glob.glob. Then iterate over the list and call that Perl script on each file, if it must process only one file at a time. Or, modify the Perl script to process multiple files and run it once with the whole list.

To keep the current Perl script and run it on each file

import os

data_path   = "/home/user/Desktop/data/"
output_path = "/home/user/Desktop/result/"

for file in os.listdir(data_path):
    if not file.endswith(".txt"):
        continue

    print("Processing " + file)                      # better use subprocess
    run_perlscript = "perl " + " perlscript.pl " + \
        data_path + file  + " >> " + output_path + "output.txt"
    os.system(run_perlscript)

The Perl script need be rewritten to lose that unneeded code duplication.

However, it is better to use subprocess module to run and manage external commands. This is advised even in the os.system documentation itself. For instance

import subprocess

with open(output_path + "output.txt", "a") as fout:
    for file in os.listdir(path):
        if not file.endswith(".txt"):
            continue 
        subprocess.run(["perl", "script.pl", data_path + file], stdout=fout)

where the file is opened in the append mode ("a") following the question's >> redirection.

The recommended subprocess.run is available since python 3.5; otherwise use Popen.

Another, and arguably "right," option is to adjust the Perl script so that it can process multiple files. Then you only need run it once, with the whole file list.

use strict;
use warnings;
use feature 'say';    
use open ':std', ':encoding(UTF-8)';

foreach my $filename (@ARGV) {
    say "Processing $filename";

    my %count;

    open my $fh, '<', $filename  or do {
       warn "Can't open '$filename': $!";
       next;
    };
    while (<$fh>) {   
        $count{ lc $1 }++ while /(\w+)/g;
    }   
    close $fh;

    my $prn_cnt = 0;
    foreach my $word ( sort { $count{$b} <=> $count{$a} } keys %count) {   
        print "$count{$word} $word\n" if $prn_cnt++ < 10; 
    }   
}

This prints a warning on a file that it can't open and skips to the next one. If you'd rather have the script exit on any unexpected file replace or do { ... }; with the original die.

Then, and using glob.glob as an example now

import glob
import subprocess

data_path   = "/home/user/Desktop/data/"
output_path = "/home/user/Desktop/result/"

files = glob.glob(data_path + "*.txt")

with open(output_path + "output.txt", "a") as fout:
    subprocess.run(["perl", "script.pl", files], stdout=fout)

Since this passes the whole list as command arguments it assumes that there aren't (high) thousands of files, to exceed some length limits on pipes or command-line.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.