currently I'm struggeling with PostgreSQL 11, JSONB and database indices.
We have a customers table (shortened):
create table public.customers (
id uuid not null,
created timestamp without time zone,
lastmodified timestamp without time zone,
data jsonb,
primary key (id, mandantid)
);
The JSON in the data column looks like this (shortened):
{
"id": "...",
"$encrypted": {
"iv": "...",
"data": "...",
"keyRef": "key1"
}
}
What I would like to do is to create a query which counts all db-entries which i.e. do NOT have 'key1' as 'keyRef'. I'm not quite an PostgreSQL expert, so I tried several approaches.
I searched several documentation / stackoverflow / blogs and found the following approaches:
Approach 1
CREATE INDEX idx_customer_encryption_key ON customers(( (data->'$encrypted')::jsonb ->>'keyRef'::text));
COUNT(id) FROM customers WHERE data->'$encrypted' ->> 'keyRef' != 'key1';
Query works but is slow (no index is used).
Finalize Aggregate (cost=163614.73..163614.74 rows=1 width=8) (actual time=6604.325..6610.718 rows=1 loops=1)
Output: count(id)
Buffers: shared hit=6550991 read=731452
-> Gather (cost=163614.51..163614.72 rows=2 width=8) (actual time=6604.315..6610.709 rows=3 loops=1)
Output: (PARTIAL count(id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550991 read=731452
-> Partial Aggregate (cost=162614.51..162614.52 rows=1 width=8) (actual time=6594.083..6594.083 rows=1 loops=3)
Output: PARTIAL count(id)
Buffers: shared hit=6550991 read=731452
Worker 0: actual time=6588.966..6588.967 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.314 ms, Inlining 0.000 ms, Optimization 0.164 ms, Emission 3.179 ms, Total 3.657 ms"
Buffers: shared hit=2180408 read=243504
Worker 1: actual time=6589.102..6589.102 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.321 ms, Inlining 0.000 ms, Optimization 0.165 ms, Emission 3.174 ms, Total 3.661 ms"
Buffers: shared hit=2142345 read=238322
-> Parallel Seq Scan on public.customers (cost=0.00..160368.33 rows=898475 width=16) (actual time=3.728..6539.198 rows=722392 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
Filter: (((customers.data -> '$encrypted'::text) ->> 'keyRef'::text) <> 'key1'::text)
Buffers: shared hit=6550991 read=731452
Worker 0: actual time=3.343..6533.485 rows=720314 loops=1
Buffers: shared hit=2180408 read=243504
Worker 1: actual time=3.808..6535.494 rows=706508 loops=1
Buffers: shared hit=2142345 read=238322
Planning:
Buffers: shared hit=20 read=2 dirtied=1
Planning Time: 0.561 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.021 ms, Inlining 0.000 ms, Optimization 0.569 ms, Emission 10.153 ms, Total 11.744 ms"
Execution Time: 6611.164 ms
Approach 2
CREATE INDEX idx_customer_encryption_key ON customers (jsonb_extract_path_text(data, '$encrypted', 'keyRef'));
SELECT COUNT(id) FROM customers WHERE jsonb_extract_path_text(data, '$encrypted', 'keyRef') != 'key1';
Query works but is slow (no index is used).
Finalize Aggregate (cost=161357.25..161357.26 rows=1 width=8) (actual time=6531.578..6538.081 rows=1 loops=1)
Output: count(id)
Buffers: shared hit=6550991 read=731452 written=2
-> Gather (cost=161357.04..161357.25 rows=2 width=8) (actual time=6531.568..6538.072 rows=3 loops=1)
Output: (PARTIAL count(id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550991 read=731452 written=2
-> Partial Aggregate (cost=160357.04..160357.05 rows=1 width=8) (actual time=6521.380..6521.381 rows=1 loops=3)
Output: PARTIAL count(id)
Buffers: shared hit=6550991 read=731452 written=2
Worker 0: actual time=6516.406..6516.407 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.297 ms, Inlining 0.000 ms, Optimization 0.186 ms, Emission 3.301 ms, Total 3.784 ms"
Buffers: shared hit=2185669 read=243926
Worker 1: actual time=6516.327..6516.328 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.331 ms, Inlining 0.000 ms, Optimization 0.161 ms, Emission 3.384 ms, Total 3.876 ms"
Buffers: shared hit=2155723 read=241152
-> Parallel Seq Scan on public.customers (cost=0.00..158110.85 rows=898475 width=16) (actual time=3.594..6466.590 rows=722392 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Filter: (jsonb_extract_path_text(customers.data, VARIADIC '{$encrypted,keyRef}'::text[]) <> 'key1'::text)"
Buffers: shared hit=6550991 read=731452 written=2
Worker 0: actual time=3.490..6460.749 rows=724558 loops=1
Buffers: shared hit=2185669 read=243926
Worker 1: actual time=3.937..6460.802 rows=712789 loops=1
Buffers: shared hit=2155723 read=241152
Planning:
Buffers: shared hit=18 read=2 dirtied=1
Planning Time: 0.453 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.931 ms, Inlining 0.000 ms, Optimization 0.516 ms, Emission 9.871 ms, Total 11.319 ms"
Execution Time: 6538.429 ms
Approach 3
CREATE INDEX idx_customer_encryption_key ON customers ( ((data #> '{$encrypted,keyRef}')::varchar));
SELECT COUNT(id) FROM customers WHERE (data #> '{$encrypted,keyRef}')::varchar <> 'initialKey';
Query works but is slow (no index is used).
Finalize Aggregate (cost=165872.20..165872.21 rows=1 width=8) (actual time=6914.265..6920.442 rows=1 loops=1)
Output: count(id)
Buffers: shared hit=6550978 read=731465 written=13
-> Gather (cost=165871.99..165872.20 rows=2 width=8) (actual time=6914.144..6920.432 rows=3 loops=1)
Output: (PARTIAL count(id))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550978 read=731465 written=13
-> Partial Aggregate (cost=164871.99..164872.00 rows=1 width=8) (actual time=6904.453..6904.454 rows=1 loops=3)
Output: PARTIAL count(id)
Buffers: shared hit=6550978 read=731465 written=13
Worker 0: actual time=6899.759..6899.759 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.349 ms, Inlining 0.000 ms, Optimization 0.173 ms, Emission 3.345 ms, Total 3.867 ms"
Buffers: shared hit=2172611 read=241759 written=7
Worker 1: actual time=6899.599..6899.600 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.347 ms, Inlining 0.000 ms, Optimization 0.174 ms, Emission 3.352 ms, Total 3.874 ms"
Buffers: shared hit=2175840 read=243601 written=3
-> Parallel Seq Scan on public.customers (cost=0.00..162625.80 rows=898475 width=16) (actual time=3.750..6848.950 rows=722392 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Filter: ((((customers.data #> '{$encrypted,keyRef}'::text[]))::character varying)::text <> 'initialKey'::text)"
Buffers: shared hit=6550978 read=731465 written=13
Worker 0: actual time=3.532..6844.562 rows=716749 loops=1
Buffers: shared hit=2172611 read=241759 written=7
Worker 1: actual time=4.014..6843.198 rows=721111 loops=1
Buffers: shared hit=2175840 read=243601 written=3
Planning:
Buffers: shared hit=18 read=2 dirtied=1
Planning Time: 0.367 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.042 ms, Inlining 0.000 ms, Optimization 0.531 ms, Emission 10.215 ms, Total 11.788 ms"
Execution Time: 6920.835 ms
Approach 4
CREATE INDEX idx_customer_encryption_key ON customers USING gin( (data -> '$encrypted') jsonb_path_ops);
SELECT COUNT(id) FROM customers WHERE data -> '$encrypted' @> '{"$encrypted": { "keyRef": "key1"}}';
Not exactly what I want, just to test if the index works. Uses an index but result always is 0.
Aggregate (cost=58652.33..58652.34 rows=1 width=8) (actual time=0.020..0.021 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=1 read=3
-> Bitmap Heap Scan on public.customers (cost=215.96..58598.15 rows=21672 width=341) (actual time=0.018..0.018 rows=0 loops=1)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""$encrypted"": {""keyRef"": ""key1""}}'::jsonb)"
Buffers: shared hit=1 read=3
-> Bitmap Index Scan on idx_customer_encryption_key (cost=0.00..210.54 rows=21672 width=0) (actual time=0.016..0.016 rows=0 loops=1)
" Index Cond: ((customers.data -> '$encrypted'::text) @> '{""$encrypted"": {""keyRef"": ""key1""}}'::jsonb)"
Buffers: shared hit=1 read=3
Planning:
Buffers: shared read=1
Planning Time: 0.680 ms
Execution Time: 0.062 ms
SELECT COUNT(data) FROM customers WHERE data -> '$encrypted' @> '{ "keyRef": "key1"}';
Counts correctly but is slow (does not use index).
Aggregate (cost=58652.33..58652.34 rows=1 width=8) (actual time=32023.053..32023.054 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=6550821 read=731787
-> Bitmap Heap Scan on public.customers (cost=215.96..58598.15 rows=21672 width=341) (actual time=112.830..31779.441 rows=2167122 loops=1)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Rows Removed by Index Recheck: 49
Heap Blocks: exact=45746 lossy=98820
Buffers: shared hit=6550821 read=731787
-> Bitmap Index Scan on idx_customer_encryption_key (cost=0.00..210.54 rows=21672 width=0) (actual time=106.353..106.354 rows=2167122 loops=1)
" Index Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Buffers: shared hit=2 read=303
Planning:
Buffers: shared hit=1
Planning Time: 0.068 ms
Execution Time: 32023.422 ms
Bergis's comment
explain(analyze, buffers, verbose, format text) SELECT COUNT(data) FROM customers WHERE NOT data -> '$encrypted' @> '{"$encrypted": { "keyRef": "key1"}}';
Finalize Aggregate (cost=161345.97..161345.98 rows=1 width=8) (actual time=6835.857..6842.275 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=6550964 read=731479
-> Gather (cost=161345.75..161345.96 rows=2 width=8) (actual time=6835.611..6842.266 rows=3 loops=1)
Output: (PARTIAL count(data))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550964 read=731479
-> Partial Aggregate (cost=160345.75..160345.76 rows=1 width=8) (actual time=6821.304..6821.304 rows=1 loops=3)
Output: PARTIAL count(data)
Buffers: shared hit=6550964 read=731479
Worker 0: actual time=6814.290..6814.291 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.647 ms, Inlining 0.000 ms, Optimization 0.230 ms, Emission 3.716 ms, Total 5.592 ms"
Buffers: shared hit=2168672 read=241803
Worker 1: actual time=6814.158..6814.158 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.658 ms, Inlining 0.000 ms, Optimization 0.234 ms, Emission 3.721 ms, Total 5.613 ms"
Buffers: shared hit=2196206 read=244975
-> Parallel Seq Scan on public.customers (cost=0.00..158110.85 rows=893960 width=341) (actual time=4.449..6766.917 rows=722392 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Filter: (NOT ((customers.data -> '$encrypted'::text) @> '{""$encrypted"": {""keyRef"": ""key1""}}'::jsonb))"
Buffers: shared hit=6550964 read=731479
Worker 0: actual time=5.220..6760.360 rows=717794 loops=1
Buffers: shared hit=2168672 read=241803
Worker 1: actual time=3.965..6761.016 rows=725142 loops=1
Buffers: shared hit=2196206 read=244975
Planning Time: 0.061 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 3.638 ms, Inlining 0.000 ms, Optimization 0.691 ms, Emission 11.364 ms, Total 15.692 ms"
Execution Time: 6842.656 ms
Approach 5
CREATE INDEX idx_customer_encryption_key ON customers USING GIN( ((data -> '$encrypted')::jsonb));
SELECT COUNT(data) FROM customers WHERE data -> '$encrypted' @> '{"keyRef" : "key1"}';
Query works but is slow (no index is used).
Aggregate (cost=58676.33..58676.34 rows=1 width=8) (actual time=27662.270..27662.272 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=6551064 read=731894 written=13274
-> Bitmap Heap Scan on public.customers (cost=239.96..58622.15 rows=21672 width=341) (actual time=145.119..27449.141 rows=2167122 loops=1)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Recheck Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Rows Removed by Index Recheck: 49
Heap Blocks: exact=45746 lossy=98820
Buffers: shared hit=6551064 read=731894 written=13274
-> Bitmap Index Scan on idx_customer_encryption_key (cost=0.00..234.54 rows=21672 width=0) (actual time=138.368..138.369 rows=2167122 loops=1)
" Index Cond: ((customers.data -> '$encrypted'::text) @> '{""keyRef"": ""initialKey""}'::jsonb)"
Buffers: shared hit=240 read=415
Planning:
Buffers: shared hit=22 read=4 dirtied=1
Planning Time: 0.575 ms
Execution Time: 27662.936 ms
Approach 6
CREATE INDEX idx_customer_encryption_key ON customers USING gin( (data #> '{$encrypted,keyRef}') );
SELECT COUNT(data) from customers WHERE (data #>> '{$encrypted,keyRef}')::text = 'key1';
Query works but is slow (no index is used).
Finalize Aggregate (cost=159122.35..159122.36 rows=1 width=8) (actual time=7124.276..7130.794 rows=1 loops=1)
Output: count(data)
Buffers: shared hit=6550972 read=731471 written=307
-> Gather (cost=159122.14..159122.35 rows=2 width=8) (actual time=7124.001..7130.783 rows=3 loops=1)
Output: (PARTIAL count(data))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6550972 read=731471 written=307
-> Partial Aggregate (cost=158122.14..158122.15 rows=1 width=8) (actual time=7108.420..7108.420 rows=1 loops=3)
Output: PARTIAL count(data)
Buffers: shared hit=6550972 read=731471 written=307
Worker 0: actual time=7100.765..7100.765 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.632 ms, Inlining 0.000 ms, Optimization 0.411 ms, Emission 9.283 ms, Total 10.327 ms"
Buffers: shared hit=2175813 read=242587 written=85
Worker 1: actual time=7100.660..7100.661 rows=1 loops=1
JIT:
Functions: 5
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 0.620 ms, Inlining 0.000 ms, Optimization 0.393 ms, Emission 8.127 ms, Total 9.140 ms"
Buffers: shared hit=2180525 read=244599 written=117
-> Parallel Seq Scan on public.customers (cost=0.00..158110.85 rows=4515 width=341) (actual time=8.311..7054.700 rows=722374 loops=3)
" Output: id, created, lastmodified, mandantid, data, customernumberpseudonym, internetnumberpseudonym, deletiondate"
" Filter: ((customers.data #>> '{$encrypted,keyRef}'::text[]) = 'initialKey'::text)"
Rows Removed by Filter: 18
Buffers: shared hit=6550972 read=731471 written=307
Worker 0: actual time=10.292..7046.967 rows=718818 loops=1
Buffers: shared hit=2175813 read=242587 written=85
Worker 1: actual time=8.559..7046.322 rows=726031 loops=1
Buffers: shared hit=2180525 read=244599 written=117
Planning:
Buffers: shared hit=17 dirtied=1
Planning Time: 0.187 ms
JIT:
Functions: 17
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.666 ms, Inlining 0.000 ms, Optimization 1.118 ms, Emission 23.162 ms, Total 25.945 ms"
Execution Time: 7131.263 ms
Approach 7
Now I have no more ideas why the index is not used for the query.
Any ideas out there? Any help is highly appreciated!
COUNT(*)would use an index, since it has to scan all rows anywayNOT?