0

I have files with names like:

0195_R1.fastq
0195_R2.fastq
0196_R1.fastq
0196_R2.fastq
0197_R1.fastq
0197_R2.fastq

and so on.

I need to run a software for each pair of files (the R1 and R2 are correspondent to each other) like:

bowtie2 -x index_files -1 0195_R1.fastq -2 0195_R2.fastq -S 0195_output.sam

With multiple pairs I'd have to run multiple times. So I tried to do a bash script using a for loop but I've had no success. Also, I don't know how to rename the output sequentially.

I've tried the following:

for R1 in $FQDIR/*_R1.fastq; do
for R2 in $FQDIR/*_R2.fastq; do

    bowtie2 -x index_files -1 $R1 -2 $R2 -S $N_output.sam

done
done

What should I do?

2
  • for R1 in $FQDIR/*_R1.fastq; do for R2 in $FQDIR/*_R2.fastq; do this will do for each R1 and for each R2, so every combination. Do it simpler - just iterate over R1 files, so the first loop, then extract the <this part>_R1.fastq of the filename with basename and cut. Then when you have "this part" then it's easy. Note that $N_output would be intepreted as the variable N_output you probably want ${N}_output. Commented Jan 16, 2020 at 20:30
  • Those indexes will do the job for i in {195..197}; { bowtie2 -x index_files -1 0195_R1.fastq -2 *${i}_R2.fastq -S *${i}_output.sam; } Commented Jan 17, 2020 at 6:45

1 Answer 1

3

If you loop over all the R1 and R2 files, you'll run bowtie for all possible pairs of data files. If I understand correctly, that's not what you want - you only want to process the corresponding pairs.

To do that, loop over R1 files only, and try to find the corresponding R2 file for each:

#!/bin/bash
fqdir=...
for r1 in "$fqdir"/*_R1.fastq; do
    r2=${r1%_R1.fastq}_R2.fastq
    if [[ -f $r2 ]] ; then
        bowtie2 -x index_files -1 "$r1" -2 "$r2" -S "$N"_output.sam
    else
        echo "$r2 not found" >&2
    fi
done

I'm not sure what $N stands for. Maybe you can use $r1 instead?

Sign up to request clarification or add additional context in comments.

7 Comments

Seems like $N is just the number prefixing each pair, so you can replace it with ${r1/_*/}.
Another point that I think can be improved (just for clarity) is r2=${r1/_R1/_R2} instead of r2=${r1%_R1.fastq}_R2.fastq.
The $N is for the number prefixing each pair. I replaced it with $r1. It worked. Thanks!
@accdias now I've seen your comment and used your solution too. Thanks!
@accdias: Using substitution is more dangerous: the path might contain _R1 somewhere else, too.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.