0

In the bash below I am looping through the pairs of .fastq files and using them in the commented command. The variable $pre has the name in it and it does extract it, the problem that I can't figure out is how to only use it in the commented command once? In the example below $pre is NA11111 but is extracted twice. Is there a way to only use it once in the command? I have tried removing duplicates with awk with no luck and trying cut. Thank you :).

Bash

 for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
 sample=${file%.fastq}
 bname=`basename $sample`
 pre="$(echo $bname|cut -d- -f1,1)"

#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam
   echo "$sample.fastq"
   echo "$sample"
   echo "$pre"
   done

current output

/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001         `this is $sample`
NA11111                                                                   `this is $pre`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001.fastq   `this is $sample.fastq`
/home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R2_001         `this is $sample`
NA11111                                                                   `this is $pre`

desired output

#bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" /home/cmccabe/Desktop/fastq/${pre}_aln.sam

$sample.fastq = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001.fastq
$sample = /home/cmccabe/Desktop/fastq/NA11111-100ng-E08A-C06_S5_L001_R1_001
$pre = NA11111
5
  • The NA11111 value is seen in both R1 and R2 files. So what logic is used to know if R1 or R2 is the file you want? Of is once you find a file for NA11111, other files with NA11111 are to be discarted? If so, you could extract the NA????? values present, list the files with that prefix and keep only the first one (head -1). Commented Jul 17, 2018 at 13:36
  • 2
    Are you purposely trying to ignore the .fastq file with "R2" in it? Commented Jul 17, 2018 at 13:36
  • 1
    Why are you trying to do a single end alignment of a pair end experiment? Is there any reason to discard half of the data? Commented Jul 17, 2018 at 13:47
  • The bwa command does a paired end aliment using the R1 and R2 of the same sample but that is NA11111 which is duplicated. Am I missing something? Thank you :) Commented Jul 17, 2018 at 13:52
  • Thank you for the catch, I think I am missing something in the command and need to modify it. This is a bit new to me, used to working with single end alignment. Thank you :). Commented Jul 17, 2018 at 13:57

2 Answers 2

1

The simplest thing to do is just keep track of which items you've already seen, and skip the current file if it is a match.

declare -A seen=()

for file in /home/cmccabe/Desktop/fastq/*.fastq ; do
  sample=${file%.fastq}
  bname=$(basename "$sample")
  pre=${name%%-*}

  # Go to the next file if $pre has already been seen
  [[ -v seen[$pre] ]] && continue

  # Remember that we've now seen $pre
  seen[$pre]=

  bwa mem -M -t 16 /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta "$sample.fastq" "$sample" "/home/cmccabe/Desktop/fastq/${pre}_aln.sam"
done
Sign up to request clarification or add additional context in comments.

Comments

1

I think that you are trying to achieve the following:

for file in /home/cmccabe/Desktop/fastq/*_R1_*.fastq
do
    file2=$(echo $file | sed 's/_R1_/_R2_/')
    sample=$(basename $file .fastq | cut -d- -f1)

    bwa mem -M -t 16 -R "@RG\tID:$sample\tSM:$sample" /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta $file $file2 > /home/cmccabe/Desktop/fastq/${sample}_aln.sam
done

This is, in my opinion, the best common-sense processing of your data. I made the assumtions that you will need both ends and that you will postprocess the result, so the ReadGroup line would be required.

1 Comment

Thank you very much @Poshi, I appreciate it :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.