Postgres query running very slow

Question

SELECT st.id
FROM
       station st, strike s
WHERE s.srikedatetime > (current_timestamp - interval '20 seconds')  and srikedatetime < current_timestamp
    AND ST_Distance_Sphere(st.Geometry, s.strikepoint)/1000 <= st.radius

The idea is when a strike hits, it could be in boundary of multiple stations based on station monitoring radius. I need to select all stations affected by any strikes in the last 20 seconds. In 20 seconds I can have thousands of strikes and when executing this query every 20 seconds, the CPU goes high and the query takes minutes to complete. When CPU is not high, it runs in milliseconds.

Query Plan:

"Nested Loop  (cost=0.42..505110.20 rows=21 width=7)"
"  Join Filter: ((_st_distance(geography(a.Geometry), geography(s.strikepoint), 0::double precision, false) / 1000::double precision) <= (a.radius)::double precision)"
"  ->  Index Only Scan using dup_strike_constraint on strike s  (cost=0.42..8.45 rows=1 width=36)"
"        Index Cond: ((srikedatetime > (now() - '00:00:20'::interval)) AND (srikedatetime < now()))"
"  ->  Seq Scan on station  st  (cost=0.00..505084.86 rows=62 width=549)"

I have tried inner join, something like this

Inner JOIN strike s ON ST_Distance(a.Geometry, s.strikepoint) < 1

and also tried ST_DWithin in where clause and grouping still slow.

ST_DWithin(s.strikepoint,a.Geometry, a.radius)

Any thoughts, please? I have indexes on strike and station tables.

The data types for st.strikepoint and a.geomerty columns is geometry. The coordinate system is 4326. Thank You!

Please update the question with data types of those columns along with the coordinate system used. — Jakub Macina
– Jakub Macina, Commented Aug 15, 2017 at 13:40
Updated the question, The data types for st.strikepoint and a.geomerty columns is geometry. The coordinate system is 4326. — johnny
– johnny, Commented Aug 15, 2017 at 13:47

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

You could create a Lazy Materialized View to stock just the data you need. It's a lot faster if you just query one table instead of joining 'station' and 'strike' tables.

Obviously, your lazy table should be indexed properly, especialy if you don't plan to 'clean' the data periodicaly.

Other option could be using a TEMP table to handle the data, if it's not necessary to store it historically and you handle the data in one session. TEMP tables are just visible to one session and 'die' when the session ends. You can create them and destroy them at any point, but in this case you should pay attention to the memory usage.

Hope it helps.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Aug 15, 2017 at 15:49

Dan

1,9211 gold badge16 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

johnny Over a year ago

I do clean strike table every hour or so which grows very fast and the station is a small table with about 5000 records which have points and polylines. Strikes are coming in from a service continuously so every 15 seconds I run this query to find the stations affected by strikes in last 15 seconds. Not sure how I can use tamp table or materialized view in this case.

Dan Over a year ago

Well, I don't know your db structure, but the Idea of a materialized view is to reduce processing in the query and add it to the insert. So you could create a trigger that inserts specific data to this "materialized" table so the query ends in being only "SELECT st.id FROM view_standar st WHERE date > (current_timestamp - interval '20 seconds')". Where the processing cost of the query is almost nothing. Also, it would help to use a different table space for this materialized table, to make the access to the data faster.

Dan Over a year ago

By the way, if you are constantly deleting rows, you should periodically vacuum your tables. If your doing so, good job. If not, maybe a factor in low execution time for queries.

johnny Over a year ago

I do vacuum and the database is pretty vacuumed at all times.

Ilya Dyoshin · Accepted Answer · 2017-08-17 16:22:54Z

0

Once your stations are changed rearely you can create field of type Geometry, reflecting the Polygon, representing the registration area of the station. i.e. using default value ST_Buffer(Geometry, radius/1000). And make a check of hit using ST_Contains(st.the_polygon, s.strikepoint) .

If this doesn't speedup you can than introduce additional indexing on Station polygon.

answered Aug 17, 2017 at 16:22

Ilya Dyoshin

4,6342 gold badges23 silver badges20 bronze badges

Comments

JanDE · Accepted Answer · 2017-08-22 09:39:57Z

I would try to reduce the usage of the possible costly function ST_Distance_Sphere by reducing the possible matches. Instead of using the exact spherical Distance for each possible match, you first check a cube around the station, and only the matches inside the cube are checked via ST_Distance_Sphere().

SELECT st.id
FROM
       station st, strike s
WHERE s.srikedatetime > (current_timestamp - interval '20 seconds')  and srikedatetime < current_timestamp
    AND ST_Distance_Sphere(st.Geometry, s.strikepoint)/1000 <= st.radius
    and s.strikepoint.x between st.Geometry.x - st.radius*1000 and st.Geometry.x + st.radius*1000
    and s.strikepoint.y between st.Geometry.y - st.radius*1000 and st.Geometry.y + st.radius*1000
    and s.strikepoint.z between st.Geometry.z - st.radius*1000 and st.Geometry.z + st.radius*1000

In a next step, you could store st.Geometry.x - st.radius*1000 as st.range_x_negative , in order to prevent calculations during the join.

Collectives™ on Stack Overflow

Postgres query running very slow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related