-1

I have a command I need to run for multiple combinations of files. The command looks like this:

myscript.pl -output_directory /path/output_"$TARGET_SAMPLE"_vs_"$NORMAL_SAMPLE" -target_sample /path/$TARGET_SAMPLE.bam -normal_sample /path/$NORMAL_SAMPLE.bam

I want to run this for multiple sets of samples without having to manually change the paths each time. Right now I set the samples before running it mannually like this:

export TARGET_SAMPLE="sample_1"
export NORMAL_SAMPLE="sample_2"

How do I run this to make sure the TARGET_SAMPLE and NORMAL_SAMPLE are always correctly matched? For each NORMAL_SAMPLE I need to run the script twice with two different TARGET_SAMPLE files. I think using an array could work but I don't know how to correctly feed that into a for loop.

Here are a few examples of the pairings I need to run:

export TARGET_SAMPLE="sample_1"
export NORMAL_SAMPLE="sample_2"

export TARGET_SAMPLE="sample_3"
export NORMAL_SAMPLE="sample_2"

export TARGET_SAMPLE="sample_4"
export NORMAL_SAMPLE="sample_5"

export TARGET_SAMPLE="sample_6"
export NORMAL_SAMPLE="sample_5"

So the first example outputs from this list of combinations would be to submit these commands in the shell:

myscript.pl -output_directory /path/output_sample_1_vs_sample_2 -target_sample /path/sample_1.bam -normal_sample /path/sample_2.bam

and the second would be:

myscript.pl -output_directory /path/output_sample_3_vs_sample_2 -target_sample /path/sample_3.bam -normal_sample /path/sample_2.bam

Thanks for your help.

5
  • 2
    You can use a couple of arrays (namely target and normal) with the pairs occupying the same positions in both arrays. Commented Jan 28, 2019 at 14:18
  • so would that involve two for loops cycling through each array? Commented Jan 28, 2019 at 14:40
  • 1
    It is hard to say what is wrong with your code because you did not provide it or the errors you encountered. Also see How to create a Minimal, Complete, and Verifiable example. Possible duplicate of bash shell nested for loop Commented Jan 28, 2019 at 14:56
  • I don't have any errors for this because I don't know how to create a for loop that pairs specific variables together rather than just looping through a single set of variables or all possible combinations. I don't see it as a straightforward nested loop because I have a hardcoded list of possible combinations of TARGET_SAMPLE and NORMAL_SAMPLE that should be run together. Commented Jan 28, 2019 at 15:02
  • No, only a single loop. Say normal=(n1 n3 n3) and target=(t2 t3 t4). Both arrays have three elements. As long as you keep the ordering you are interested into, you can loop with a variable n from 0 to length(normal) and using ${normal[n]} and ${target[n]} as your parameters. Commented Jan 28, 2019 at 15:32

1 Answer 1

3

Method 1 using while-loop reading multiple values from a "here-document":

export TARGET_SAMPLE NORMAL_SAMPLE

# special characters in the values (eg. space) will cause problems
while read TARGET_SAMPLE NORMAL_SAMPLE ANYTHING_ELSE; do
    # insert sanity checks here
    myscript.pl -output_directory /path/output_"$TARGET_SAMPLE"_vs_"$NORMAL_SAMPLE" -target_sample /path/$TARGET_SAMPLE.bam -normal_sample /path/$NORMAL_SAMPLE.bam
done <<'EOD'
sample_1 sample_2
sample_3 sample_2
sample_4 sample_5
sample_6 sample_5
EOD

Method 1b as Method 1 but read data from an external file:

# spcial characters in the values (eg. space) will cause problems
cat >mydata <<'EOD'
sample_1 sample_2
sample_3 sample_2
sample_4 sample_5
sample_6 sample_5
EOD

export TARGET_SAMPLE NORMAL_SAMPLE

# normally $ANYTHING_ELSE should be empty but embedded spaces will confuse read
cat mydata | while read TARGET_SAMPLE NORMAL_SAMPLE ANYTHING_ELSE; do
    # insert sanity checks here
    myscript.pl -output_directory /path/output_"$TARGET_SAMPLE"_vs_"$NORMAL_SAMPLE" -target_sample /path/$TARGET_SAMPLE.bam -normal_sample /path/$NORMAL_SAMPLE.bam
done

Method 2 wrapping with a shell function:

export TARGET_SAMPLE NORMAL_SAMPLE

wrapper(){
    TARGET_SAMPLE=$1
    NORMAL_SAMPLE=$2
    # insert sanity checks here
    myscript.pl -output_directory /path/output_"$TARGET_SAMPLE"_vs_"$NORMAL_SAMPLE" -target_sample /path/$TARGET_SAMPLE.bam -normal_sample /path/$NORMAL_SAMPLE.bam
}

wrapper "sample_1" "sample_2"
wrapper "sample_3" "sample_2"
wrapper "sample_4" "sample_5"
wrapper "sample_6" "sample_5"

Method 3 using for loop over multiple arrays:

Bash has indexed array variables so a for loop is possible but keeping the arrays synchronised is error-prone so I don't recommend it.

Sign up to request clarification or add additional context in comments.

3 Comments

cat file | while read ... is better written while read; do ... done < file
Method 1 works great however sometimes I have to submit the command to a LSF using bsub and if I don't export the variable with export TARGET_SAMPLE="sample_1" it doesn't work, is there a way to export the variables using Method 1? Apologies if that is not very clear.
Point taken. The cat is intended as a placeholder. It could have been something more complicated like the output of grep or a database query or a longer pipeline.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.