3

I have the following PostGIS/greSQL query

SELECT luc.*
FROM spatial_derived.lucas12 luc,
  (SELECT geom
   FROM spatial_derived.germany_bld
   WHERE state = 'SN') sn
WHERE ST_Contains(sn.geom, luc.geom)

Query plan:

Nested Loop  (cost=2.45..53.34 rows=8 width=236) (actual time=1.030..26.751 rows=1282 loops=1)
  ->  Seq Scan on germany_bld  (cost=0.00..2.20 rows=1 width=18399) (actual time=0.023..0.029 rows=1 loops=1)
        Filter: ((state)::text = 'SN'::text)
        Rows Removed by Filter: 15
  ->  Bitmap Heap Scan on lucas12 luc  (cost=2.45..51.06 rows=8 width=236) (actual time=1.002..26.031 rows=1282 loops=1)
        Recheck Cond: (germany_bld.geom ~ geom)
        Filter: _st_contains(germany_bld.geom, geom)
        Rows Removed by Filter: 499
        Heap Blocks: exact=174
        ->  Bitmap Index Scan on lucas12_geom_idx  (cost=0.00..2.45 rows=23 width=0) (actual time=0.419..0.419 rows=1781 loops=1)
              Index Cond: (germany_bld.geom ~ geom)
Planning time: 0.536 ms
Execution time: 27.023 ms

which is due to an index on the geometry columns pretty fast. However when I want to add a buffer to the sn polygon (1 big polygon that represents a border line, hence a quite simple feature):

SELECT luc.*
FROM spatial_derived.lucas12 luc,
  (SELECT ST_Buffer(geom, 30000) geom
   FROM spatial_derived.germany_bld
   WHERE state = 'SN') sn
WHERE ST_Contains(sn.geom, luc.geom)

Query plan:

Nested Loop  (cost=0.00..13234.80 rows=7818 width=236) (actual time=6221.391..1338380.257 rows=2298 loops=1)
  Join Filter: st_contains(st_buffer(germany_bld.geom, 30000::double precision), luc.geom)
  Rows Removed by Join Filter: 22637
  ->  Seq Scan on germany_bld  (cost=0.00..2.20 rows=1 width=18399) (actual time=0.018..0.036 rows=1 loops=1)
        Filter: ((state)::text = 'SN'::text)
        Rows Removed by Filter: 15
  ->  Seq Scan on lucas12 luc  (cost=0.00..1270.55 rows=23455 width=236) (actual time=0.005..25.623 rows=24935 loops=1)
Planning time: 0.271 ms
Execution time: 1338381.079 ms

the query takes forever! I blame it on the not existing index in the temporally table sn. The massive decrease in speed can't be 'caused by ST_Buffer() as it's itself really fast and the buffered feature is simple.

Two Questions:

1) Am I right?

2) What can I do, to reach similar speed as with the first query?

5
  • please provide plans for both queries Commented May 16, 2017 at 19:28
  • added the query plans Commented May 17, 2017 at 8:36
  • Why use a subquery? You could use ... x JOIN y ON ST_contains(x.a, y.b) ... Commented May 17, 2017 at 10:38
  • @wildplasser you're right, however using the wron function was the reason for the decrease in speed. Commented May 17, 2017 at 11:42
  • My guess is that ST_buffer() renders the join-condition non-sargeable, since it hides the indexed field inside the function. Commented May 17, 2017 at 11:47

2 Answers 2

2

I've ran into a trap. ST_Buffer() is not the right choice here rather ST_DWithin() which keeps the indexes of every geometry column when actually performing a bounding box comparison. The help page for ST_Buffer() clearly states to not make the mistake using ST_Buffer(), but instead use ST_DWithin() for radius searches. Since the word Buffer is used in a lot of GIS softwares I didn't consider looking for alternatives.

SELECT luc.*
FROM spatial_derived.lucas12 luc
JOIN spatial_derived.germany_bld sn ON ST_DWithin(sn.geom, luc.geom, 30000)
WHERE bld.state = 'SN'

works and only takes a second (2300 points within that "buffer")!

Sign up to request clarification or add additional context in comments.

Comments

0

to check if you right, you can leave sn as is and apply ST_Buffer on join:

SELECT luc.*
FROM spatial_derived.lucas12 luc,
  (SELECT geom
   FROM spatial_derived.germany_bld
   WHERE state = 'SN') sn
WHERE ST_Contains(ST_Buffer(sn.geom, 30000), luc.geom)

Query plan:

Nested Loop  (cost=0.00..13234.80 rows=7818 width=236) (actual time=6237.876..1340000.576 rows=2298 loops=1)
  Join Filter: st_contains(st_buffer(germany_bld.geom, 30000::double precision), luc.geom)
  Rows Removed by Join Filter: 22637
  ->  Seq Scan on germany_bld  (cost=0.00..2.20 rows=1 width=18399) (actual time=0.023..0.038 rows=1 loops=1)
        Filter: ((state)::text = 'SN'::text)
        Rows Removed by Filter: 15
  ->  Seq Scan on lucas12 luc  (cost=0.00..1270.55 rows=23455 width=236) (actual time=0.004..24.525 rows=24935 loops=1)
Planning time: 0.453 ms
Execution time: 1340001.420 ms

this query will answer both your questions or first, depending on result.

Update

  1. Your assumption seems to be wrong. The ST_Buffer() causes speed drop down
  2. You seem to join on much larger set when using the ST_Buffer, so time increase is quite expected. You can run explain analyze for both with and without ST_Buffer() queries - it probably will show same plans with different rows number and cost second value...

7 Comments

your solution takes 22min while the same query without ST_Buffer() takes 593msec. Still a way to huge difference.
When runing SELECT ST_Buffer(geom, 30000) FROM spatial_derived.germany_bld WHERE state = 'SN' it also only takes 76msec. So is it really 'causing the speed drop down? I will run EXPLAIN ANALYZE.
oh, yes please check with analyze for sure. if you have smth strange - dont ommit analyze
@andrasz please check plan for: with sn as ( SELECT geom FROM spatial_derived.germany_bld WHERE state = 'SN' ) SELECT luc.* FROM spatial_derived.lucas12 luc JOIN sn on sn.geom = luc.geom WHERE ST_Contains(ST_Buffer(sn.geom, 30000), luc.geom)
this is really fast, however has no results
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.