0

I have a problem with a bash script I am trying to use. I have a directory with 1000s of files and I want to run a command sequentially using each file. However, each file is paired with another, e.g File1.sam, File1.gz, File2.sam, File2.gz etc.. and the command I am executing requires that I use both files from a pair as arguments. I have been using something similar to the command below when only a single argument was required, and I thought (wrongly) that I could just simply extend it like below.

shopt -s nullglob
for myfile1 in *.sam && for myfile2 in *.gz 
do
./bwa samse -r "@RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta    $myfile1 $myfile2 > $myfile1.sam2 2>$myfile1.log
done

Anyone know how I can modify this or point me in the direction of another way of doing it?

4 Answers 4

2

Why not generate the second filename, e.g. replace .sam with .gz

for myfile1 in *.sam  ; do
  myfile2="${myfile1%.sam}.gz"
  [ -e "$myfile2" ] || continue
  ./bwa samse -r "@RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta "$myfile1" "$myfile2" > "$saiFile".sam 2>"$saiFile".log
done
Sign up to request clarification or add additional context in comments.

3 Comments

Looks good. Only suggestion for improvement would be using a more strict PE myfile2="${myfile1%.sam}.gz".
Oh, I didn't see that you had removed the $myfile1 parameter expansions from the quotes on the ./bwa line. Why did you do this? This will break due to wordsplitting if the filenames have whitespace.
@JoshCartwright i didn't saw the quotes around the line, in general i tend to quote everything but i didn't saw the quotes at the beginning of the line
1
shopt -s nullglob
for myfile1 in *.sam
do
  myfile2=$(echo $myfile1|sed s/.sam$/.gz/)
  ./bwa samse -r "@RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta $myfile1 $myfile2 > $saiFile.sam 2>$saiFile.log
done

Comments

1

Iterate only over files with one of the extensions (for instance *.gz) and use for instance sed to get the matching .sam file.

Something like this:

for myfile1 in *.sam 
do
  sam_name=`echo $myfile | sed -e s#gz\\$#sam#`
  ./bwa samse -r "@RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta       $myfile1 $myfile2 > $saiFile.sam 2>$saiFile.log
done

Comments

0

Change your for loop using one of the file extensions and calculate the other file name. For example:

for p in a b c; do touch $p.1 $p.2; done
for f in *.1; do g=${f%%.}.2; echo $f $g; done

This displays:

a.1 a.2
b.1 b.2
c.1 c.2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.