SQL query optimizing

Question

Could someone help me with optimizing my SQL Query. The database is postgres. My table structure looks like:

create table test_table(test_id integer NOT NULL, sequence_id integer NOT NULL,value1 integer NOT NULL, value2 integer NOT NULL, CONSTRAINT test_table_pk PRIMARY KEY (test_id , sequence_id ))

create table test_event(event_id integer NOT NULL,test_id integer NOT NULL, sequence_id integer NOT NULL , CONSTRAINT test_event_pk PRIMARY KEY(event_id , test_id, sequence_id))

test_table
1,1, 200,300
2,2, 400,500
2,3, 600,700
2,4, 300,500
2,5, 200,900

test_event
1, 1,1
1, 2,2
1, 2,3
2, 2,4
2, 2,5

And I want to get all the value1 and value2 from test_table where sequence_id and test_id corresponds to event_id =1 in test_event. My query looks like

SELECT
  value1, value2
FROM
  test_table
WHERE
  sequence_id IN (
    SELECT sequence_id
    FROM test_event
    WHERE event_id=1) AND
  test_id IN (
    SELECT test_id
    FROM test_event
    WHERE event_id=1)

Can someone please let me know if this is the optimal way of writing this query?

EXPLAIN ANALYZE SELECT... Use real data if you want to know what the query optimizer really thinks. Use fake data if you want to know what the query optimizer fake thinks. — Mike Sherrill 'Cat Recall'
– Mike Sherrill 'Cat Recall', Commented Jul 2, 2013 at 4:03

face · Accepted Answer · 2013-07-02 15:37:58Z

3

You can use an INNER JOIN to optimize your query, this way you won't have to query the 'test_event' table twiceon different attributes.

SELECT t.value1, t.value2
FROM test_table t, test_event e 
WHERE e.event_id = 1 
    AND t.test_id = e.test_id 
    AND t.sequence_id = e.sequence_id

EDIT: Added on suggestions provided in comments.

SELECT t.value1, t.value2  
FROM test_table t INNER JOIN test_event e  
ON ( e.event_id = 1  
    AND t.test_id = e.test_id 
    AND t.sequence_id = e.sequence_id)

edited Jul 2, 2013 at 15:37

answered Jul 1, 2013 at 21:30

face

1,49513 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Clockwork-Muse Over a year ago

Yes, writing a join will probably yield a more optimal query. However, please don't use the implict-join (comma-separated FROM clause), as it's recommended against on most dbs, or outright deprecated. It also makes dealing with things like LEFT JOINs difficult - it's always best to explicitly qualify your joins. Also, please learn to format your queries for easier reading.

contradictioned Over a year ago

If you assume, that the SQL engine does not optimize the query, it is even worse: The two querys on table test_event will be executed for every row in test_table.

Clockwork-Muse Over a year ago

@contradictioned - yes, but that's a terribly stupid optimizer that can't turn those queries into at least a temp-table-key-lookup, and the smart ones may actually be able to turn it into the equivalent of the actual JOIN-ed query.

face Over a year ago

@Clockwork-Muse I read it from the postgresql documentation itself that the two syntax don't have much of a difference, so went with this one. Can't comment about other databases, anyways updated

contradictioned Over a year ago

@Clockwork-Muse: Indeed, and since he/she is using postgres, there's a good optimizer on stage. If someone forced to use MySQL is reading this, there might be need of manual optimization. See the Postgres plan sqlfiddle.com/#!10/47756/1/0 vs the MySQL plan: sqlfiddle.com/#!2/477562/1/0 (I trust sqlfiddle to provide us the right plans :) )

Gordon Linoff · Accepted Answer · 2013-07-02 02:05:28Z

The question is whether sequence_id and test_id have to come from the same record in test_event. For instance, the pair (1, 2) satisfies the original query, because the even id 1 sequence id 2 are both on rows with event_id = 1 but they are not on the same row.

Your in query is perhaps the best way to express this relationship. Another way is using join and aggregation:

SELECT tt.value1, tt.value2
FROM test_table tt join
     test_event te
     on te.event_id = 1
group by tt.value1, tt.value2
having sum(case when tt.sequence_id = te.sequence_id then 1 else 0 end) > 0 and
       sum(case when tt.event_id = t2.event_id then 1 else 0 end) > 0;

This replaces the in with a join (basically a cross join) and aggregation. I would guess that with indexes on te.sequence_id and te.event_id, your original version would be better.

Collectives™ on Stack Overflow

SQL query optimizing

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related