1

I have a SQL statement that does nearly what I want. What I need is to find genus such that
pindent > 60 and coverage > 60 for both qseqid values. I think I need some type of join, maybe like in this question

Here is what I have now. Which does not achieve the result I want.

SELECT qseqid, genus, species, txid, sgi, pindent, coverage 
FROM vmdavis.insecta10000
WHERE pindent > 60
AND coverage > 60
AND qseqid in ("diaci0.9_transcript_99990000013040", "diaci0.9_transcript_99990000022677")
ORDER BY  genus, species, qseqid, coverage, pindent;

Here is an example of why this does not work. Anchon meets the above criteria for qseqid for dia...040 but not for dia...677 so I would not what this row.

| diaci0.9_transcript_99990000013040 | Anchon           | sp. NYSM 95-02-01-35          |  265052 |   6467730 |   80.93 |  61.7597 |

Here is a sample of the table

mysql> SELECT qseqid, genus, species, txid, pindent, coverage FROM vmdavis.insecta10000 limit 5;
+------------------------------------+---------+-------------+--------+---------+----------+
| qseqid                             | genus   | species     | txid   | pindent | coverage |
+------------------------------------+---------+-------------+--------+---------+----------+
| diaci0.9_transcript_99990000000055 | Apis    | florea      |   7463 |    97.5 |  2.58107 |
| diaci0.9_transcript_99990000000055 | Bombus  | impatiens   | 132113 |    97.5 |   3.3534 |
| diaci0.9_transcript_99990000000055 | Nasonia | vitripennis |   7425 |    97.5 |  1.58343 |
| diaci0.9_transcript_99990000000055 | Bombus  | terrestris  |  30195 |    97.5 |  3.41207 |
| diaci0.9_transcript_99990000000055 | Apis    | mellifera   |   7460 |    97.5 |  2.88889 |
+------------------------------------+---------+-------------+--------+---------+----------+

Here is an example. In this case genus Agetocera is listed twice because for both qseqid it meets the criteria for pindent and coverage. Niether of these rows should be listed if Agetocera did not meet the conditions of pindent > 60 and coverage > 60 for both qseqid

| qseqid                             | genus     | species     | txid   | pindent | coverage
| diaci0.9_transcript_99990000013040 | Agetocera | mirablis    |  715820 | 291191497 |   82.37 |  60.7963 |
| diaci0.9_transcript_99990000022677 | Agetocera | mirablis    |  909986 | 309755769 |   77.52 |  78.6269 |

I am very new to mysql, I assume the answer to this question probably exists on stackoverflow. I just don't know what to search for or understand the solutions if I find it. If the question can be better ask or you can suggest a better title I will update.

7
  • can you provide your table structure? also provide some sample data and your expected result Commented Feb 4, 2013 at 5:16
  • When you say you need to find genus, do you mean just SELECT DISTINCT genus? What problems are you having? Commented Feb 4, 2013 at 5:23
  • @sgeddes I want rows from the db only is the pindent and coverage criteria are met for both qseqid values. I still want the rows for both genus and qseqid so I don't what to group the output. Commented Feb 4, 2013 at 5:30
  • All your pindent and coverage values follow your criteria (i.e. your select does what you want it to do). Commented Feb 4, 2013 at 5:31
  • What is your desired output from your above example? I don't see an issue with your current implementation. Commented Feb 4, 2013 at 5:32

2 Answers 2

1

Try something like this -- uses a subquery to get only the desired genus:

SELECT *
FROM insecta10000 i 
  JOIN 
  (
  SELECT genus
  FROM insecta10000
  WHERE pindent > 60
    AND coverage > 60
    AND qseqid in ("diaci0.9_transcript_99990000013040", "diaci0.9_transcript_99990000022677")
  GROUP BY genus
  HAVING COUNT(*) = 2
  ) i2 on i.genus = i2.genus 

And here is the SQL Fiddle.

Good luck.

Sign up to request clarification or add additional context in comments.

4 Comments

Looks promising, not sure why but it appears to return all qseqid. see this updated SQL Fiddle sqlfiddle.com/#!2/664b3/1
If you only want to return 40 and 77, try adding those to your where clause as well -- sqlfiddle.com/#!2/664b3/9
I wish :-) Apis should not return in this example because it never has the qseqid ...77 here is an updated fiddle sqlfiddle.com/#!2/e5cb0/1
FYI this is what I ended up with (I think I can edit your answer directly but maybe better if you do) : SELECT * FROM vmdavis.insecta10000 i JOIN ( SELECT genus FROM vmdavis.insecta10000 WHERE pindent > 50 AND coverage > 50 AND qseqid in ("diaci0.9_transcript_99990000013040", "diaci0.9_transcript_99990000022677") GROUP BY genus HAVING COUNT(distinct qseqid) = 2 ) i2 on i.genus = i2.genus WHERE qseqid in ("diaci0.9_transcript_99990000013040", "diaci0.9_transcript_99990000022677") ORDER BY i.genus;
0

If you want the records that satisfy both Coverage > 60, pindent > 60, you already got the query. However if you are looking at something like this, the union of records which satisfy Coverage and pindent separately, then try this:

SELECT * FROM (
SELECT qseqid, genus, species, txid, sgi, pindent, coverage 
FROM vmdavis.insecta10000
WHERE pindent > 60    
UNION
SELECT qseqid, genus, species, txid, sgi, pindent, coverage 
FROM vmdavis.insecta10000
WHERE coverage > 60) x
WHERE x.qseqid in ("diaci0.9_transcript_99990000013040", "diaci0.9_transcript_99990000022677")
ORDER BY  x.genus, x.species, x.qseqid, x.coverage, x.pindent
;

Now that you have given expected output: (although columns differ..slightly sig):

http://sqlfiddle.com/#!2/f89ce/4

SELECT qseqid, genus, species, txid, 
indent, coverage 
FROM demo
WHERE indent > 60
AND coverage > 60
AND qseqid in ("diaci0.9_transcript_99990000013040", "diaci0.9_transcript_99990000022677")
ORDER BY  genus, species, qseqid, coverage, indent;

|                             QSEQID |     GENUS |  SPECIES |   TXID | INDENT | COVERAGE |
------------------------------------------------------------------------------------------
| diaci0.9_transcript_99990000013040 | Agetocera | mirablis | 715820 |  82.37 |  60.7963 |
| diaci0.9_transcript_99990000022677 | Agetocera | mirablis | 909986 |  77.52 |  78.6269 |

6 Comments

Why is the UNION necessary? Couldn't you just use OR? Still confused about OP request though.
@sgeddes I thought of using OR, but seems like that's not what he wants.. Coz OR will cover a record that satisfy both conditions or one. Looking at his query, he already got the way in to satisfy both. So the left possibility (big assumption) could be UNION getting the records separately.. odd though.
@bonCodigo maybe a union is what I want but not on pindent and coverage but on qseqid. I'll play with that a bit.
Ok I tried this (below) not what I wanted not relly clear how it is different from what I have. SELECT qseqid, genus, species, txid, sgi, pindent, coverage FROM vmdavis.insecta10000 WHERE pindent > 60 AND coverage > 60 AND qseqid = "diaci0.9_transcript_99990000013040" UNION SELECT qseqid, genus, species, txid, sgi, pindent, coverage FROM vmdavis.insecta10000 WHERE pindent > 60 AND coverage > 60 AND qseqid = "diaci0.9_transcript_99990000022677" ORDER BY genus, species, qseqid, coverage, pindent;
Frankly looking at your expected output, there doesn't seem to be an issue, check here sqlfiddle.com/#!2/f89ce/4 please. Could that be you have two different filed pindent and indent?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.