1

I'm trying to adapt the following lines of code for use with GNU parallel:

for ID in $(cut -f1 markers.tsv);
    do echo $ID;
    FAA=${ID}.faa.gz
    zcat ${FAA} | muscle -out ${ID}.msa
    done

Preferably without creating an intermediate script.

However, the examples I'm seeing here do not show where I can use my ${ID} argument.

This could be one a one liner:

for ID in $(cut -f1 markers.tsv);
    do echo $ID && FAA=${ID}.faa.gz && zcat ${FAA} | muscle -out ${ID}.msa
    done

I'm trying this but it appears to not be running the jobs simultaneously:

cut -f1 markers.tsv | parallel -j 16 -I @ 'echo "@" && FAA="@.faa.gz" && zcat $FAA | muscle -out @.msa'

Can someone help me adapt this using 16 jobs correctly?

Example markers.tsv

PF00709.21\t1\ta
PF00406.22\t2\tb
PF01808.18\t3\tc
3
  • Put the commands in a script, and run the script with parallel. Commented Oct 2, 2021 at 0:58
  • Is there a way to do it without an intermediate script? Commented Oct 2, 2021 at 0:58
  • Probably. But doing it with a script will almost certainly be easier. Commented Oct 2, 2021 at 0:59

2 Answers 2

1

Due to a bug in GNU Parallel an input line cannot be longer that the maximal command line length.

cut -f1 markers.tsv |
  parallel -j16 'echo {} && zcat {}.faa.gz | muscle -out {}.msa'
Sign up to request clarification or add additional context in comments.

Comments

1

Something like

parallel --jobs 16 -a markers.tsv -C '\t' 'echo {1} && zcat {1}.faa.gz | muscle -out {1}.msa'

should work. Uses markers.tsv as the input file, with tab-separated columns, and replaces {1} in the command with the value of the first column when running the command for each line.

Since it sounds like the columns are really, really long, and you're running into maximum command line length restrictions, you might have more luck putting the bulk of what you want to do in a function (or script file):

# Assuming bash
dowork() {
    echo "$1"
    zcat "$1.faa.gz" | muscle -out "$1.msa"
}
export -f dowork
parallel --jobs 16 -a markers.tsv -C '\t' dowork '{1}'

7 Comments

Thanks for your suggestion. I tried running it but it didn't end up working. Maybe I can tweak a few things. Is -C the delimiter for -a table? Is sh referring to shell and is this a positional?
@O.rka What does didn't end up working mean? Consult the man page if any of the options are unfamiliar... and man sh for what sh -c arg does, of course.
I was trying it on a sample to just echo parallel -a markers.tsv -q -j 4 -C '\t' sh -c 'echo "{1}"' but I got this error parallel: Error: Command line too long (137055 >= 131063) at input 0:
@O.rka Interesting. Add a sample of your markers.tsv file to your question.
I've made a trimmed version b/c my actual markers.tsv file has HUGE columns. Is there a way to do cut -f1 markers.tsv and somehow tell -a that it is from stdin?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.