@@ -110,75 +110,70 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
110110 FROM
111111 generate_series(1,10000000);
112112SELECT 10000000
113- =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
114- CREATE INDEX
115- =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
116- pg_size_pretty
117- ----------------
118- 153 MB
119- (1 row)
120- =# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
121- CREATE INDEX
122- =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
123- pg_size_pretty
124- ----------------
125- 387 MB
126- (1 row)
127113</programlisting>
128114
129115 <para>
130116 A sequential scan over this large table takes a long time:
131117<programlisting>
132118=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
133- QUERY PLAN
134- -------------------------------------------------------------------&zwsp;-----------------------------------------
135- Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
119+ QUERY PLAN
120+ -------------------------------------------------------------------&zwsp;-----------------------------------
121+ Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=16.971..16.971 rows=0 loops=1)
136122 Filter: ((i2 = 898732) AND (i5 = 123451))
137- Rows Removed by Filter: 10000000
138- Planning time : 0.177 ms
139- Execution time: 1445.473 ms
123+ Rows Removed by Filter: 100000
124+ Planning Time : 0.346 ms
125+ Execution Time: 16.988 ms
140126(5 rows)
141127</programlisting>
142128 </para>
143129
144130 <para>
145- So the planner will usually select an index scan if possible.
146- With a btree index, we get results like this :
131+ Even with the btree index defined the result will still be a
132+ sequential scan :
147133<programlisting>
134+ =# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
135+ CREATE INDEX
136+ =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
137+ pg_size_pretty
138+ ----------------
139+ 3976 kB
140+ (1 row)
148141=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
149- QUERY PLAN
150- -------------------------------------------------------------------&zwsp;-------------------------------------------------------------
151- Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
152- Index Cond : ((i2 = 898732) AND (i5 = 123451))
153- Heap Fetches: 0
154- Planning time : 0.193 ms
155- Execution time: 445.770 ms
142+ QUERY PLAN
143+ -------------------------------------------------------------------&zwsp;-----------------------------------
144+ Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.805..12.805 rows=0 loops=1)
145+ Filter : ((i2 = 898732) AND (i5 = 123451))
146+ Rows Removed by Filter: 100000
147+ Planning Time : 0.138 ms
148+ Execution Time: 12.817 ms
156149(5 rows)
157150</programlisting>
158151 </para>
159152
160153 <para>
161- Bloom is better than btree in handling this type of search:
154+ Having the bloom index defined on the table is better than btree in
155+ handling this type of search:
162156<programlisting>
157+ =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
158+ CREATE INDEX
159+ =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
160+ pg_size_pretty
161+ ----------------
162+ 1584 kB
163+ (1 row)
163164=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
164- QUERY PLAN
165- -------------------------------------------------------------------&zwsp;--------------------------------------------------------
166- Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
165+ QUERY PLAN
166+ -------------------------------------------------------------------&zwsp;--------------------------------------------------
167+ Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.388..0.388 rows=0 loops=1)
167168 Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
168- Rows Removed by Index Recheck: 2439
169- Heap Blocks: exact=2408
170- -> Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
169+ Rows Removed by Index Recheck: 29
170+ Heap Blocks: exact=28
171+ -> Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.356..0.356 rows=29 loops=1)
171172 Index Cond: ((i2 = 898732) AND (i5 = 123451))
172- Planning time : 0.475 ms
173- Execution time: 76.778 ms
173+ Planning Time : 0.099 ms
174+ Execution Time: 0.408 ms
174175(8 rows)
175176</programlisting>
176- Note the relatively large number of false positives: 2439 rows were
177- selected to be visited in the heap, but none actually matched the
178- query. We could reduce that by specifying a larger signature length.
179- In this example, creating the index with <literal>length=200</literal>
180- reduced the number of false positives to 55; but it doubled the index size
181- (to 306 MB) and ended up being slower for this query (125 ms overall).
182177 </para>
183178
184179 <para>
@@ -187,24 +182,36 @@ CREATE INDEX
187182 A better strategy for btree is to create a separate index on each column.
188183 Then the planner will choose something like this:
189184<programlisting>
185+ =# CREATE INDEX btreeidx1 ON tbloom (i1);
186+ CREATE INDEX
187+ =# CREATE INDEX btreeidx2 ON tbloom (i2);
188+ CREATE INDEX
189+ =# CREATE INDEX btreeidx3 ON tbloom (i3);
190+ CREATE INDEX
191+ =# CREATE INDEX btreeidx4 ON tbloom (i4);
192+ CREATE INDEX
193+ =# CREATE INDEX btreeidx5 ON tbloom (i5);
194+ CREATE INDEX
195+ =# CREATE INDEX btreeidx6 ON tbloom (i6);
196+ CREATE INDEX
190197=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
191- QUERY PLAN
192- -------------------------------------------------------------------&zwsp;-----------------------------------------------------------
193- Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148 ..0.148 rows=0 loops=1)
198+ QUERY PLAN
199+ -------------------------------------------------------------------&zwsp;--------------------------------------------------------
200+ Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.028 ..0.029 rows=0 loops=1)
194201 Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
195- -> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145 ..0.145 rows=0 loops=1)
196- -> Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089 ..0.089 rows=10 loops=1)
202+ -> BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.027 ..0.027 rows=0 loops=1)
203+ -> Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.026 ..0.026 rows=0 loops=1)
197204 Index Cond: (i5 = 123451)
198- -> Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1 )
205+ -> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed )
199206 Index Cond: (i2 = 898732)
200- Planning time: 2.049 ms
201- Execution time : 0.280 ms
207+ Planning Time: 0.491 ms
208+ Execution Time : 0.055 ms
202209(9 rows)
203210</programlisting>
204211 Although this query runs much faster than with either of the single
205- indexes, we pay a large penalty in index size. Each of the single-column
206- btree indexes occupies 214 MB, so the total space needed is over 1.2GB ,
207- more than 8 times the space used by the bloom index.
212+ indexes, we pay a penalty in index size. Each of the single-column
213+ btree indexes occupies 2 MB, so the total space needed is 12 MB ,
214+ eight times the space used by the bloom index.
208215 </para>
209216 </sect2>
210217
0 commit comments