2

I have a series of files in sub-directories that I want to loop through, process, and name according to the input filename and the various parameters (models) I'm using to process the files.

For example file names like AG005574, AG004788, AG003854 and parameter/model values like ATd, PZa, RTK1, so I want to end with files like AG005574_ATd AG005574_PZa AG005574_RTK1 AG004788_ATd AG004788_PZa etc. I loop through the subfolders, run the process and output the results like so:

#!/usr/bin/bash

model=$1
for file in $(find /path/to/files/*/ -type f -name 'AG*.fa');
     do output=${model}"_"${file} ;
        process_call --out=$output."tab" --options ../Path/to/model/$1.hmm $file ;
    echo $file
done

I want to be able to specify the model on the command-line (hence the model=$1). However, my approach does not work in general; I can get the output named by model using

do output=$model ;

but this also writes only the last file processed because it over-writes all the others (and no input filename is used). Any help/tutoring is much appreciated.

1
  • 1
    You don't need semicolons at the ends of lines in general. Replace output=${model}"_"${file} with output="${model}_${file}". Those are general observations, not answers to your question. Commented Jan 14, 2015 at 15:53

2 Answers 2

1

Pass ALL the model names as parameters to the script:

/path/to/script ATd PZa RTK1

then

#!/bin/bash    
find /path/to/files/*/ -type f -name 'AG*.fa' | 
while IFS= read -r file; do
    echo "$file"
    for model in "$@"; do
        output="${file%.fa}_$model.tab"
        process_call --out="$output" --options "../Path/to/model/$model.hmm" "$file"
    done
done

If you already know all the models, you can build that into the script:

#!/bin/bash    
models=( ATd PZa RTK1 )
...
    for model in "${models[@]}"; do
...
Sign up to request clarification or add additional context in comments.

Comments

1

I think your problem is that when the file name given by find is:

/path/to/files/xyz/AG002378.fa

your output parameter becomes, for $1 as ATd,

ATd_/path/to/files/xyz/AG002378.fa

instead of:

/path/to/files/xyz/AG002378_ATd

That is, you want the .fa removed, and the _ATd added.

The classic commands for this are dirname and basename:

dir=$(dirname "$file")
base=$(basename "$file" .fa)
output="$dir/${file}_$1"

There are tricks you can do with:

base_with_suffix=${file##*/}
base=${base_with_suffix%.fa}

which do not invoke an external command. The dirname operation can be done too:

dir=${file%/*}

but I think basename and dirname are clearer (but I could be biassed by many years experience during which there wasn't an alternative). Also, there are edge cases where the string manipulation expressions don't work well but the commands work correctly, but they are unlikely to actually impact your code.

It is not entirely clear from your question exactly what you want as the output, but variations on the themes shown should allow you to solve the problem.

1 Comment

Thank you Jonathan! I will play around with these.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.