1

I have multiple samples with R1 and R2 reads in fastq.gz format (these files are complementary to each other) I want to run BWA mem paired end parallel on all the files once finished each R1 and R2 complementary file should produce one sam file. Right now I am making two sam file from the two reads

This is what I have come up with but it’s not doing what I need it to do

for i in `find -maxdepth 2 -iname *fastq.gz -type f`; do
   echo "bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta  ${i}_R1_001.fastq.gz  ${i}_R2_001.fastq.gz > ${i}_R1_R2.sam"
done

when it runs it looks like this

bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta  ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_001.fastq.gz ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R2_001.fastq.gz > ./Sample_0747/0747_CGG_L001_R2_001.fastq.gz_R1_R2.sam

bwa mem -t 12 H.Sapiens/ucsc.hg19.fasta  ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_001.fastq.gz ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R2_001.fastq.gz > ./Sample_0748/0748_CCA_L001_R1_001.fastq.gz_R1_R2.sam
-bash-4.1$

I understand the problem is in iname but how do I fixit? Thank you so much

1
  • as you can see, two different answers. So, you could add the wanted output example... ;) Commented Mar 13, 2015 at 20:42

2 Answers 2

1

Try

find -maxdepth 2 -iname \*fastq.gz -type f |
sed 's/_R[12]_001\.fastq\.gz$//' |
sort -u | 
while IFS= read -r f; do
   echo "bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta \"${f}_R1_001.fastq.gz\"  \"${f}_R2_001.fastq.gz\" > \"${f}_R1_R2.sam\""
done
Sign up to request clarification or add additional context in comments.

Comments

1

Don't loop over a value parsed like that*. First, put your code in a script for sanity's sake, like

cat > script < SCRIPT
  for i; do
    bwa mem -t 12 /H.Sapiens/ucsc.hg19.fasta "${i}_R"{1,2}_001.fastq.gz > "${i}_R1_R2.sam"
  done
SCRIPT
chmod +x script

Then, either use the -exec predicate, or xargs, like

find -maxdepth 2 -iname '*fastq.gz' -type f -exec ./script {} +

or

find -maxdepth 2 -iname '*fastq.gz' -type f -print0 | xargs -0 ./script

*It says "parsing ls", but it applies to parsing any command meant for human consumption. find is expressly called out.


On another note, if you don't put quotes around your arguments to find, the shell may interpret them as globs.

find -iname *fastq.gz

could expand to

find -iname foofastq.gz barfastq.gz bazfastq.gz

You want

find -iname '*fastq.gz'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.