4

I have 44 .tsv files in one folder and I want to calculate the number of intersect of each pairwise with intersect command of bedtools tool. each output file would have 4 columns and I just need to save only sum of value of column 4 in each output file. I can do it easily when I do it by one one but when I use parallel processing to do the whole process at the same time I get syntax error

Here is the code and result when I try each two pairs by one one manually

$ bedtools intersect -a p1.tsv -b p2.tsv -c

chr1    1   5   1

chr1    8   12  1

chr1    18  20  1

chr1    21  25  0

bedtools intersect -a p1.tsv -b p2.tsv -c | awk '{sum+=$4} END {print sum}

3

Here is the code and result when I am using parallel processing

$ parallel "bedtools intersect -a {1} -b {2} -c |awk '{sum+=$4} END {print sum}'> {1}.{2}.intersect" ::: `ls *.tsv` ::: `ls *.tsv`

awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error

The result should be 44*44 files that contain one single value foe example just 3

3 Answers 3

4

@DudiBoy has a good solution. But to me it is annoying that I have to make another file just because I want to call GNU Parallel.

So you can also use functions. This way you do not need to make a new file:

doit() {
  bedtools intersect -a "$1" -b "$2" -c | awk '{sum+=$4} END {print sum}'
}
export -f doit

parallel --results {1}.{2}.intersect doit {1} {2} ::: *.tsv ::: *.tsv
Sign up to request clarification or add additional context in comments.

Comments

2

I think you need to quote it like this:

parallel bedtools intersect -a {1} -b {2} -c \| awk \'{sum+=\$4} END{print sum+0}\' \> {1}.{2}.intersect ::: *tsv ::: *tsv

Comments

2

I believe @MarkSetchell is valid answer. You can also try to clean it up by inserting your complicated line into a bash script you can test.

intersect.bash

 #!/bin/bash
 bedtools intersect -a $1 -b $2 -c | awk '{sum+=$4} END {print sum}'

Test intersect.bash to function correctly, then parallel it.

parallel intersect.bash {1} {2}

good luck.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.