0

I am new at scripting and I am trying to concatenate multiple files, whose paths are listed a text file and output a combined gzip file. for example list file - File_list.txt contains these file paths

/data/path/file1.txt
data2/path2/file2.txt
....file3.txt
....file4.txt

So far my code is for all files listed in a local directory ( outputs only combined file not gzipped):

#!/usr/bin/perl

use strict;
use File::Slurp;

my $directory = 'Users/xyz/Documents/';

opendir(dir, $directory) or die $!;
my @files = readdir(dir);
closedir dir;

my $outfilename = 'Combined.fastq'

my $outfilesrc = undef;

foreach (sort @files){ 
  $outfilesrc.= File::Slurp::slurp("$basedir/$_");
}

open(OUT, "> $basedir/$outfilename") or die ("Can't open for writing: 
  $basedir/$outfilename : $!");
print OUT $outfilesrc;
close OUT;

exit;

Can someone please share how to read the files using this list rather than one single directory? I know it's much easier in simple bash but i am trying to create a module for a pipeline so need this in Perl. Thanks!

5
  • Try to read the list into an array, @files. See The correct way to read a data file into an array Commented Nov 30, 2015 at 21:50
  • 2
    "I know it's much easier in simple bash" That's only true when "simple bash" is the only language you know well Commented Nov 30, 2015 at 23:40
  • @Borodin Actually you don't need to "know well" bash scripting: cat `cat file_list` >> new_file is simpler than that perl script Commented Dec 1, 2015 at 8:29
  • Well, unless you want to deal with edge cases properly, because that example will break on e.g. filenames with spaces. Commented Dec 1, 2015 at 11:27
  • yes the command line/bash was easy for quick tasks with limited number of files. Since now the files can be scattered all over directory and and from different sources I need to code it to avoid any possible error Commented Dec 1, 2015 at 15:25

2 Answers 2

3

You don't seem to do anything with a zip file. I can't even begin to guess (Archive::Zip is pretty good though).

For concatenating a bunch of files, you can make use of the ARGV or <> filehandle.

#!/usr/bin/env perl
use strict;
use warnings;

open ( my $combined, '>', 'combined.fastq') or die $!; 

select $combined; 
print while <>; 

close $combined; 

Should do the trick - you open an output file, select it as the default place to print, and then print every line captured in <> - which is all the data in any files specified on the command line, or piped data.

So invoking this script as merge.pl *.txt will take all the text files (in the current directory) and merge them into the combined file.

As you've got an input list - that's as simple as:

open ( my $list_of_files, '<', 'file_list.txt' ) or die;
chomp ( @ARGV = <$list_of_files>);
close ( $list_of_files ); 

This will accomplish the same result overall.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks a lot for the reply. Yes didn't know I would have to apply this script on bunch of large set files and then zip them up in the end. Also, I might have to use this script on zip files too, so I am thinking of using zcat for that. Would it be the recommended way?
zcat is for gzipped files. You cannot use zcat on zipped files. Archive::Zip is not for gzipped files. Look at IO::Compress::Gzip and IO::Uncompress::Gunzip for gzip. Look for instance here: unix.stackexchange.com/questions/48690/…
Sorry its my fault, I am using it for multiple gzipped files which would be in GBs and since these are huge files I need to keep the scape/time used in mind too.
Different question that. Worth asking separately - and explaining what you are trying to accomplish.
0

Thanks a lot for your replies - The script runs well now, being new at perl it sounded difficult one to me. Just posting my code below -

#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;
use IO::Compress::Gzip qw(gzip $GzipError);


my @data = read_file('./File_list.txt');
my $out = "./test.txt";


foreach my $data_file (@data)

{
    chomp($data_file);
    system("cat $data_file >> $out");
}
my $outzip = "./test.gz";
gzip $out => $outzip;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.