6
SELECT st.id
FROM
       station st, strike s
WHERE s.srikedatetime > (current_timestamp - interval '20 seconds')  and srikedatetime < current_timestamp
    AND ST_Distance_Sphere(st.Geometry, s.strikepoint)/1000 <= st.radius

The idea is when a strike hits, it could be in boundary of multiple stations based on station monitoring radius. I need to select all stations affected by any strikes in the last 20 seconds. In 20 seconds I can have thousands of strikes and when executing this query every 20 seconds, the CPU goes high and the query takes minutes to complete. When CPU is not high, it runs in milliseconds.

Query Plan:

"Nested Loop  (cost=0.42..505110.20 rows=21 width=7)"
"  Join Filter: ((_st_distance(geography(a.Geometry), geography(s.strikepoint), 0::double precision, false) / 1000::double precision) <= (a.radius)::double precision)"
"  ->  Index Only Scan using dup_strike_constraint on strike s  (cost=0.42..8.45 rows=1 width=36)"
"        Index Cond: ((srikedatetime > (now() - '00:00:20'::interval)) AND (srikedatetime < now()))"
"  ->  Seq Scan on station  st  (cost=0.00..505084.86 rows=62 width=549)"

I have tried inner join, something like this

Inner JOIN strike s ON ST_Distance(a.Geometry, s.strikepoint) < 1

and also tried ST_DWithin in where clause and grouping still slow.

ST_DWithin(s.strikepoint,a.Geometry, a.radius)

Any thoughts, please? I have indexes on strike and station tables.

The data types for st.strikepoint and a.geomerty columns is geometry. The coordinate system is 4326. Thank You!

2
  • 2
    Please update the question with data types of those columns along with the coordinate system used. Commented Aug 15, 2017 at 13:40
  • Updated the question, The data types for st.strikepoint and a.geomerty columns is geometry. The coordinate system is 4326. Commented Aug 15, 2017 at 13:47

3 Answers 3

1

You could create a Lazy Materialized View to stock just the data you need. It's a lot faster if you just query one table instead of joining 'station' and 'strike' tables.

Obviously, your lazy table should be indexed properly, especialy if you don't plan to 'clean' the data periodicaly.

Other option could be using a TEMP table to handle the data, if it's not necessary to store it historically and you handle the data in one session. TEMP tables are just visible to one session and 'die' when the session ends. You can create them and destroy them at any point, but in this case you should pay attention to the memory usage.

Hope it helps.

Sign up to request clarification or add additional context in comments.

4 Comments

I do clean strike table every hour or so which grows very fast and the station is a small table with about 5000 records which have points and polylines. Strikes are coming in from a service continuously so every 15 seconds I run this query to find the stations affected by strikes in last 15 seconds. Not sure how I can use tamp table or materialized view in this case.
Well, I don't know your db structure, but the Idea of a materialized view is to reduce processing in the query and add it to the insert. So you could create a trigger that inserts specific data to this "materialized" table so the query ends in being only "SELECT st.id FROM view_standar st WHERE date > (current_timestamp - interval '20 seconds')". Where the processing cost of the query is almost nothing. Also, it would help to use a different table space for this materialized table, to make the access to the data faster.
By the way, if you are constantly deleting rows, you should periodically vacuum your tables. If your doing so, good job. If not, maybe a factor in low execution time for queries.
I do vacuum and the database is pretty vacuumed at all times.
0

Once your stations are changed rearely you can create field of type Geometry, reflecting the Polygon, representing the registration area of the station. i.e. using default value ST_Buffer(Geometry, radius/1000). And make a check of hit using ST_Contains(st.the_polygon, s.strikepoint) .

If this doesn't speedup you can than introduce additional indexing on Station polygon.

Comments

0

I would try to reduce the usage of the possible costly function ST_Distance_Sphere by reducing the possible matches. Instead of using the exact spherical Distance for each possible match, you first check a cube around the station, and only the matches inside the cube are checked via ST_Distance_Sphere().

SELECT st.id
FROM
       station st, strike s
WHERE s.srikedatetime > (current_timestamp - interval '20 seconds')  and srikedatetime < current_timestamp
    AND ST_Distance_Sphere(st.Geometry, s.strikepoint)/1000 <= st.radius
    and s.strikepoint.x between st.Geometry.x - st.radius*1000 and st.Geometry.x + st.radius*1000
    and s.strikepoint.y between st.Geometry.y - st.radius*1000 and st.Geometry.y + st.radius*1000
    and s.strikepoint.z between st.Geometry.z - st.radius*1000 and st.Geometry.z + st.radius*1000

In a next step, you could store st.Geometry.x - st.radius*1000 as st.range_x_negative , in order to prevent calculations during the join.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.