Postgres: Optimize complex Query?

Question

I have trouble optimizing queries once they start getting rather large like the one below.

select distinct tm.proto_location from track_message tm where
   workflow_analytic_instance_id = 204 and tm.id in 
   (Select track_message_id from track_message_to_track_mapping where track_id in 
   (select distinct t.id from track t, track_item item where t.id = item.track_id and
   item.item_time between 1328816277089000 and 1328816287089000 and item.id in 
   (Select track_item_id from track_point tp where ST_Intersects(tp.track_position,
   ST_GeomFromText('POLYGON((-144 59, -41 46, -75 15, -127 25, -144 59))',4326)))));

I'm not sure if I need to restructure the query or add additional index's as I currently only have 1 on track_position. Below is my analysis on the query

HashAggregate  (cost=3321073.27..3321099.83 rows=2656 width=126) (actual
time=38937.642..38937.781 rows=341 loops=1)
  ->  Hash Semi Join  (cost=3312041.16..3321066.63 rows=2656 width=126) (actual
      time=38860.624..38937.235 rows=341 loops=1)"
      Hash Cond: (tm.id = track_message_to_track_mapping.track_message_id)
      ->  Seq Scan on track_message tm  (cost=0.00..8441.48 rows=5280 width=134) (actual time=31.643..81.135 rows=5027 loops=1)
          Filter: (workflow_analytic_instance_id = 204)
      ->  Hash  (cost=3310705.63..3310705.63 rows=81402 width=8) (actual time=38824.785..38824.785 rows=1026 loops=1)
          Buckets: 4096  Batches: 4  Memory Usage: 11kB
          ->  Hash Join  (cost=3306662.03..3310705.63 rows=81402 width=8) (actual time=38741.641..38820.901 rows=1026 loops=1)
                Hash Cond: (track_message_to_track_mapping.track_id = t.id)
                ->  Seq Scan on track_message_to_track_mapping  (cost=0.00..2995.04 rows=162804 width=16) (actual time=0.023..36.404 rows=162678 loops=1)
                ->  Hash  (cost=3306623.23..3306623.23 rows=3104 width=8) (actual time=38737.721..38737.721 rows=1026 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 41kB"
                      ->  Unique  (cost=3299618.84..3306592.19 rows=3104 width=8) (actual time=38578.330..38737.166 rows=1026 loops=1)
                            ->  Merge Join  (cost=3299618.84..3306584.43 rows=3104 width=8) (actual time=38578.327..38735.062 rows=10303 loops=1)
                                  Merge Cond: (t.id = item.track_id)
                                  ->  Index Scan using track_pkey on track t  (cost=0.00..6763.86 rows=162639 width=8) (actual time=0.020..122.626 rows=160111 loops=1)
                                  ->  Sort  (cost=3299617.79..3299625.55 rows=3104 width=8) (actual time=38571.786..38574.074 rows=10303 loops=1)
                                        Sort Key: item.track_id
                                        Sort Method: quicksort  Memory: 867kB
                                        ->  Hash Semi Join  (cost=2688037.93..3299437.75 rows=3104 width=8) (actual time=25663.691..38562.198 rows=10303 loops=1)
                                              Hash Cond: (item.id = tp.track_item_id)
                                              ->  Seq Scan on track_item item  (cost=0.00..598761.77 rows=17867 width=16) (actual time=1177.986..3128.122 rows=20606 loops=1)
                                                    Filter: ((item_time >= 1328816277089000::bigint) AND (item_time <= 1328816287089000::bigint))
                                              ->  Hash  (cost=2636161.58..2636161.58 rows=3161948 width=8) (actual time=24330.672..24330.672 rows=9485846 loops=1)
                                                    Buckets: 4096  Batches: 512 (originally 128)  Memory Usage: 1025kB"
                                                    ->  Seq Scan on track_point tp  (cost=0.00..2636161.58 rows=3161948 width=8) (actual time=5.506..20772.158 rows=9485846 loops=1)
                                                          Filter: ((track_position && '0103000020E6100000010000000500000000000000000062C00000000000804D4000000000008044C000000000000047400000000000C052C00000000000002E400000000000C05FC0000000000000394000000000000062C00000000000804D40'::geometry) AND _st_intersects(track_position, '0103000020E6100000010000000500000000000000000062C00000000000804D4000000000008044C000000000000047400000000000C052C00000000000002E400000000000C05FC0000000000000394000000000000062C00000000000804D40'::geometry))
 Total runtime: 38938.104 ms

I dont have the ability to change the tables as the database was created by another company. I do however have the liberty to add additional indexes as I see fit. The tables used in the query are below.

CREATE TABLE d2d.track_message
(
id bigserial NOT NULL,
proto_location text,
workflow_analytic_instance_id bigint NOT NULL,
CONSTRAINT track_message_pkey PRIMARY KEY (id),
CONSTRAINT track_message_workflow_analytic_instance_id_fkey FOREIGN KEY(workflow_analytic_instance_id)
  REFERENCES d2d.workflow_analytic_instance (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE d2d.track_message_to_track_mapping
(
id bigserial NOT NULL,
track_message_id bigint NOT NULL,
track_id bigint NOT NULL,
CONSTRAINT track_message_to_track_mapping_pkey PRIMARY KEY (id),
CONSTRAINT track_message_to_track_mapping_track_id_fkey FOREIGN KEY (track_id)
  REFERENCES d2d.track (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_message_to_track_mapping_track_message_id_fkey FOREIGN KEY (track_message_id)
  REFERENCES d2d.track_message (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE d2d.track
(
id bigserial NOT NULL,
track_uuid text,
track_number text,
track_exercise_indicator_id bigint NOT NULL,
track_simulation_indicator_id bigint NOT NULL,
track_status_id bigint,
last_modified timestamp with time zone DEFAULT timezone('utc'::text, now()),
CONSTRAINT track_pkey PRIMARY KEY (id),
CONSTRAINT track_track_exercise_indicator_id_fkey FOREIGN KEY (track_exercise_indicator_id)
  REFERENCES d2d.track_exercise_indicator (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_track_simulation_indicator_id_fkey FOREIGN KEY (track_simulation_indicator_id)
  REFERENCES d2d.track_simulation_indicator (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_track_status_id_fkey FOREIGN KEY (track_status_id)
  REFERENCES d2d.track_status (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_track_uuid_key UNIQUE (track_uuid)
);

CREATE TABLE d2d.track_item
(
id bigserial NOT NULL,
track_item_type_id bigint NOT NULL,
item_time bigint NOT NULL,
image_source text,
track_id bigint NOT NULL,
CONSTRAINT track_item_pkey PRIMARY KEY (id),
CONSTRAINT track_item_track_id_fkey FOREIGN KEY (track_id)
  REFERENCES d2d.track (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_item_track_item_type_id_fkey FOREIGN KEY (track_item_type_id)
  REFERENCES d2d.track_item_type (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

CREATE TABLE d2d.track_point
(
id bigserial NOT NULL,
track_position d2d.geometry(PointZ,4326),
track_point_type_id bigint,
track_point_source_type_id bigint,
last_modified timestamp with time zone DEFAULT timezone('utc'::text, now()),
track_item_id bigint NOT NULL,
CONSTRAINT track_point_pkey PRIMARY KEY (id),
CONSTRAINT track_point_track_item_id_fkey FOREIGN KEY (track_item_id)
  REFERENCES d2d.track_item (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_point_track_point_source_type_id_fkey FOREIGN KEY (track_point_source_type_id)
  REFERENCES d2d.track_point_source_type (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT track_point_track_point_type_id_fkey1 FOREIGN KEY (track_point_type_id)
  REFERENCES d2d.track_point_type (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);

Why is track_item.item_time a bigint? (I'd expect a timestamp) Why is it not indexed (IMHO this causes the seqscan) — joop
– joop, Commented Mar 3, 2014 at 16:33
the time is in ms this has to do with the way the data is stored. — Jeremy
– Jeremy, Commented Mar 3, 2014 at 17:06

joop · Accepted Answer · 2014-03-03 17:28:42Z

1

First attempt: use EXISTS() instead of IN(), and add an index (assuming this could be UNIQUE, not sure) to track_item.item_time (untested, since I have no data, obviously):

CREATE UNIQUE INDEX ON track_item ( item_time);

-- ----- 
EXPLAIN ANALYZE
SELECT DISTINCT tm.proto_location
FROM track_message tm
WHERE tm.workflow_analytic_instance_id = 204
AND EXISTS ( SELECT *
        FROM track_message_to_track_mapping tm2tm
        JOIN track t ON t.id = tm2tm.track_id
        JOIN track_item ti ON t.id = ti.track_id
        JOIN track_point tp ON ti.id = tp.track_item_id
        WHERE tm.id =tm2tm.track_message_id
        AND ti.item_time BETWEEN 1328816277089000 AND 1328816287089000
        AND ST_Intersects
                (tp.track_position
                , ST_GeomFromText('POLYGON((-144 59, -41 46, -75 15, -127 25, -144 59))',4326)
                )
        )
        ;

answered Mar 3, 2014 at 17:28

joop

4,5431 gold badge18 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jeremy Over a year ago

item_time isnt unique but I created an index on item_time and made the other changes from your query and it cut my query time in half so thank you.

joop Over a year ago

As a general hint: try replacing in() by exists() as much as you can. In this case the generator gives a different plan, probably caused by the 3-deep nesting of IN() ) Also: when reading query plans look for places where estimated rowcount differs from actual rowcount. seqscans are not always bad, they can be Ok if most of the table is needed anyway.

Jeremy Over a year ago

Would it help to add a index on my foreign keys like in the track_item table the track_id? My datasets have millions of records so not sure if that would help also. If so would I make it a combined index ie create index on track_item (track_id, item_time)

joop Over a year ago

No. Foreign keys automagically impose an index on the referring table and require (at least) an unique constraint (or index) on the referred table.

Justin · Accepted Answer · 2014-03-04 12:35:45Z

0

I think you could use something like this, if I didn't make mistake:

SELECT DISTINCT tm.proto_location
FROM track_message tm
INNER JOIN track_message_to_track_mapping ON track_message_to_track_mapping.track_message_id = tm.id
INNER JOIN track t ON track_message_to_track_mapping.track_id = t.id
INNER JOIN track_item item ON t.id = item.track_id
INNER JOIN track_point ON track_point.track_item_id = item.id
WHERE workflow_analytic_instance_id = 204
  AND item.item_time BETWEEN 1328816277089000 AND 1328816287089000
  AND ST_Intersects(tp.track_position, ST_GeomFromText('POLYGON((-144 59, -41 46, -75 15, -127 25, -144 59))',4326))

answered Mar 4, 2014 at 12:35

Justin

9,7336 gold badges38 silver badges49 bronze badges

Collectives™ on Stack Overflow

Postgres: Optimize complex Query?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related