1

I have several files with the same name in different folders. I want to execute my bash script to all of them in parallel. Is it possible to put them in a separate .txt file or in the same bash script and execute?i.e,

all.tab file

path/to/set1/my.bam
path/to/set2/my.bam
path/to/set3/my.bam

and a bash,

#!/usr/bin/env bash
#$ -q cluster_name
#$ -cwd
#$ -N job_name
#$ -e /path/to/log
#$ -o /path/to/log
#$ -l job_mem=16G
#$ -pe serial 4

PICARD="path/to/picard"
BAM="/path/to/all.tab"

echo "validating bam file"

$PICARD/picard.jar ValidateSamFile I=$BAM MODE=SUMMARY

So that it is going to launch several jobs to gueue in parallel and write log outputs or other output files in the respective folders. If there is any other way, I appreciate any help. EDIT: I invoke it as: qsub ./test.sh

6
  • How to invoke the shell script and which are the files to invoke on? Commented Nov 24, 2016 at 9:15
  • You mean it as qsub ./test.sh path/to/set1/my.bam, qsub ./test.sh path/to/set2/my.bam, etc? Commented Nov 24, 2016 at 9:26
  • No, because I want the script to be executed on variable BAM that take all files in the tab file. I am sorry if it is confusing.. Commented Nov 24, 2016 at 9:53
  • Am afraid its quite vague! atleast to me Commented Nov 24, 2016 at 9:57
  • I don't know how to explain better than this. I have several my.bam files in different folders and I want to execute 1 single bash script on all of them, launching only once qsub ./test.sh Commented Nov 24, 2016 at 10:01

2 Answers 2

2

You can use the find command in order to first "find" all files with that given name within your directory structure.

Then you can use xargs using its "-P" option in order to run commands on that output of find in parallel.

See here for further details.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi, so I did use parallel, but this is not exactly what I was looking for, I need something like array in which one command line works for several hundreds of files.
0

I was looking for something like this (but may be a more elegant ways exist).

PICARD="path/to/picard"
BAMFILES="path/to/set1/test.bam
path/to/set2/test.bam
path/to/set3/test.bam"

for f in $BAMFILES
do 
    $PICARD/picard.jar ValidateSamFile I=$f MODE=SUMMARY
done

3 Comments

Probably you should clarify your own idea. You talk about doing things in parallel; but your own solution here ... doesn't. You see, your little script is nothing but a wrapper to some other (java based) tool. What difference does it make if your script is invoked 50 times; or if the script is invoked 1 time, but then loops over 50 entries given to it? Seriously: if you want to do things in parallel, then just look into my solution. As in general: simply try to built on top of existing tools instead of re-inventing your own tooling.
It was an example. I posted my question here to get some suggestion. I am looking into your solution, but for a person who started with bash yday, this is not so straightforward. thanks anyway
OK, got it. And hint for the bash newbie: try to do as less things with bash scripts. Yes, you can script down many things with bash, but there is a ton of things that you can get wrong when you just started using it. Thus: if you find solutions that work without you writing bash code, prefer those.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.