0

I've trawled through similar questions to this, but can't quite find something that works, and wondered if you could kindly help.

I am trying to execute the following bash script:

#!/bin/bash
for i in {0..10000}
do
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
Select 
 S.name as rsID,
 S.chrom,
 S.chromEnd,
 S.chromStart,
 K.txStart,
 K.txEnd, 
 G.geneSymbol,
 G.kgID 
from snp138 as S
left join knownGene as K on
 (S.chrom=K.chrom and not(K.txEnd+1000000<S.chromStart or S.chromEnd+1000000<K.txStart))
right join kgXref as G on
 (K.name=G.kgID)
where
  S.name in ("snp_permutation"$i".txt")'| awk -F '|' '{print $1}' | sed 1d | awk '{print $7}' | sort -u > permutation"$i"_genelist.txt

Essentially, I have 10,000 files called snp_permutation"$i".txt, where i runs from 1 to 10,000. Each of these files is just a single line, and its only content looks something like:

rs16574876

For this script to work, I need the actual content of the files (e.g. "rs16574876" to go between the quotation marks in S.name in ("snp_permutation"$i".txt")', rather than the name of the file itself.

Do I need to use source or export for this?

Thank you for your help.

1
  • 2
    Save yourself a ton of trouble by starting small and building out. Currently you have to simultaneously debug bash, awk, sed and mysql. Instead start with something trivial like i=1; echo 'The value is ("snp_permutation"$i".txt)' and then solve each problem until you get it to write The value is (42). But to answer your question: no, you do not need to (and can't) use source or export for this. Commented Apr 18, 2018 at 16:57

1 Answer 1

1

How about this

#!/bin/bash

template=$(cat <<'END'
    Select 
        S.name as rsID,
        S.chrom,
        S.chromEnd,
        S.chromStart,
        K.txStart,
        K.txEnd, 
        G.geneSymbol,
        G.kgID 
    from snp138 as S
    left join knownGene as K on
        (S.chrom=K.chrom and not(K.txEnd+1000000<S.chromStart or S.chromEnd+1000000<K.txStart))
    right join kgXref as G on
        (K.name=G.kgID)
    where
        S.name in (%s)
END
)

for (( i=0; i <= 10000; i++ )); do
    printf -v sql "$template" "$(< "snp_permutation${i}.txt")"
    mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e "$sql" |
      awk -F '|' 'NR > 1 {split($1, a, " "); print a[7]}' | 
      sort -u > "permutation${i}_genelist.txt"
done

This uses $(<file) which is a bash builtin for $(cat file)

Sign up to request clarification or add additional context in comments.

5 Comments

That looks promising - Will try that. Thank you.
I tried it for the first four permutations, and for each line, I get an error message: ERROR 1054 (42S22) at line 1: Unknown column 'rs12683614' in 'where clause' ERROR 1054 (42S22) at line 1: Unknown column 'rs117744418' in 'where clause' ERROR 1054 (42S22) at line 1: Unknown column 'rs66991551' in 'where clause' ERROR 1054 (42S22) at line 1: Unknown column 'rs72795184' in 'where clause' Any ideas?
Oh, the value is not quoted. In the template you would want the last line to read S.name in ('%s'), and you'd need to do more work to protect against SQL injection in the shell for loop.
That's great - seems to work perfectly. Thank you again, Glenn - much appreciated.
take sql injection seriously. a snp_permutations file containing foo');drop table snp138;-- would be disastrous. bobby-tables.com

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.