1

So I have this really ugly (but functional) multiple subquery mysql Select command.

select myt_taxon_list, count(myt_taxon_list)
from (select t1.taxon_list as myt_taxon_list, t1.query_prot_id,
             count(t1.query_prot_id) / (select count(t.id)
                                        from taxa t, stax st
                                        where t.systematics like concat('%; ', st.taxon_list, '%')
                                              and t.taxon_name not like '%thaliana%'
                                              and st.taxon_list = t1.taxon_list
                                        group by t1.taxon_list) as taxfrac
      from task1 t1
      where t1.taxon_list = 'Stramenopiles'
      group by t1.query_prot_id) myt 
where taxfrac > 0.5

In the last line you can read 'Stramenopiles'. The result of this is a simple count. Now I want to write a bash script that writes the counts not only for Stramenopiles but for all entries in that column - not as a sum, but for every entry separated. I need to combine this with an iterative loop in bash but I never scripted before. Can someone help me with this?

4
  • This seems to be two questions in one. You need to first extract the entries in the column in some way before you can write the loop to go over them (which are separate issues). Do you know how to do the first part? Commented Mar 1, 2013 at 13:51
  • Thank you for the clarification! I really don't know how to do that unfortunately :( Commented Mar 1, 2013 at 14:06
  • What about SELECT DISTINCT taxon_list FROM task1? Commented Mar 1, 2013 at 14:08
  • where do I have to put in in? Do you mean instead of "t1.taxon_list = 'Stramenopiles' "? Commented Mar 1, 2013 at 14:12

1 Answer 1

1

I wouldn't do this in bash at all. If you remove

where t1.taxon_list = 'Stramenopiles'

you can use a group by myt_taxon_list in your outer select

select myt_taxon_list, count(myt_taxon_list)
from (select t1.taxon_list as myt_taxon_list, t1.query_prot_id,
             count(t1.query_prot_id) / (select count(t.id)
                                        from taxa t, stax st
                                        where t.systematics like concat('%; ', st.taxon_list, '%')
                                              and t.taxon_name not like '%thaliana%'
                                              and st.taxon_list = t1.taxon_list
                                        group by t1.taxon_list) as taxfrac
      from task1 t1
      group by t1.query_prot_id, t1.taxon_list) myt 
where taxfrac > 0.5
group by myt_taxon_list

which gives you all the taxon_list entries nicely counted.

Sign up to request clarification or add additional context in comments.

6 Comments

I tried this before; the result is a wrong count and many column entries are missing =/
@Wicked_sue Then how is taxon_list and query_prot_id related? This should be a 1<->1 relationship.
Ahh there seems to be the problem. For a taxon_list entry exist more than 1 query_prot_id entries. I am interested in the disctinct count of their occurence.
@Wicked_sue You must then group by both taxon_list and query_prot_id, otherwise you get arbitrary results. Please see modified answer.
It works perfectly! Thank you very very very much for the quick answers; I was on this for 3 days now and I was about to get mad :D
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.