1- <!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.41 2003/05/15 15:50:18 petere Exp $ -->
1+ <!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.42 2003/05/28 16:03:55 tgl Exp $ -->
22
33<chapter id="indexes">
44 <title id="indexes-title">Indexes</title>
2020 <title>Introduction</title>
2121
2222 <para>
23- The classical example for the need of an index is if there is a
24- table similar to this:
23+ Suppose we have a table similar to this:
2524<programlisting>
2625CREATE TABLE test1 (
2726 id integer,
@@ -32,24 +31,24 @@ CREATE TABLE test1 (
3231<programlisting>
3332SELECT content FROM test1 WHERE id = <replaceable>constant</replaceable>;
3433</programlisting>
35- Ordinarily , the system would have to scan the entire
36- <structname>test1</structname> table row by row to find all
34+ With no advance preparation , the system would have to scan the entire
35+ <structname>test1</structname> table, row by row, to find all
3736 matching entries. If there are a lot of rows in
38- <structname>test1</structname> and only a few rows (possibly zero
39- or one) returned by the query, then this is clearly an inefficient
40- method. If the system were instructed to maintain an index on the
41- <structfield>id</structfield> column, then it could use a more
37+ <structname>test1</structname> and only a few rows (perhaps only zero
38+ or one) that would be returned by such a query, then this is clearly an
39+ inefficient method. But if the system has been instructed to maintain an
40+ index on the <structfield>id</structfield> column, then it can use a more
4241 efficient method for locating matching rows. For instance, it
4342 might only have to walk a few levels deep into a search tree.
4443 </para>
4544
4645 <para>
47- A similar approach is used in most books of non-fiction: Terms and
46+ A similar approach is used in most books of non-fiction: terms and
4847 concepts that are frequently looked up by readers are collected in
4948 an alphabetic index at the end of the book. The interested reader
5049 can scan the index relatively quickly and flip to the appropriate
51- page, and would not have to read the entire book to find the
52- interesting location . As it is the task of the author to
50+ page(s), rather than having to read the entire book to find the
51+ material of interest . Just as it is the task of the author to
5352 anticipate the items that the readers are most likely to look up,
5453 it is the task of the database programmer to foresee which indexes
5554 would be of advantage.
@@ -73,13 +72,14 @@ CREATE INDEX test1_id_index ON test1 (id);
7372
7473 <para>
7574 Once the index is created, no further intervention is required: the
76- system will use the index when it thinks it would be more efficient
75+ system will update the index when the table is modified, and it will
76+ use the index in queries when it thinks this would be more efficient
7777 than a sequential table scan. But you may have to run the
7878 <command>ANALYZE</command> command regularly to update
7979 statistics to allow the query planner to make educated decisions.
8080 Also read <xref linkend="performance-tips"> for information about
8181 how to find out whether an index is used and when and why the
82- planner may choose to <emphasis>not</emphasis> use an index.
82+ planner may choose <emphasis>not</emphasis> to use an index.
8383 </para>
8484
8585 <para>
@@ -198,7 +198,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
198198 than B-tree indexes, and the index size and build time for hash
199199 indexes is much worse. Hash indexes also suffer poor performance
200200 under high concurrency. For these reasons, hash index use is
201- discouraged.
201+ presently discouraged.
202202 </para>
203203 </note>
204204 </para>
@@ -250,14 +250,13 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor);
250250 Currently, only the B-tree and GiST implementations support multicolumn
251251 indexes. Up to 32 columns may be specified. (This limit can be
252252 altered when building <productname>PostgreSQL</productname>; see the
253- file <filename>pg_config .h</filename>.)
253+ file <filename>pg_config_manual .h</filename>.)
254254 </para>
255255
256256 <para>
257257 The query planner can use a multicolumn index for queries that
258- involve the leftmost column in the index definition and any number
259- of columns listed to the right of it without a gap (when
260- used with appropriate operators). For example,
258+ involve the leftmost column in the index definition plus any number
259+ of columns listed to the right of it, without a gap. For example,
261260 an index on <literal>(a, b, c)</literal> can be used in queries
262261 involving all of <literal>a</literal>, <literal>b</literal>, and
263262 <literal>c</literal>, or in queries involving both
@@ -266,7 +265,9 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor);
266265 (In a query involving <literal>a</literal> and <literal>c</literal>
267266 the planner might choose to use the index for
268267 <literal>a</literal> only and treat <literal>c</literal> like an
269- ordinary unindexed column.)
268+ ordinary unindexed column.) Of course, each column must be used with
269+ operators appropriate to the index type; clauses that involve other
270+ operators will not be considered.
270271 </para>
271272
272273 <para>
@@ -283,8 +284,8 @@ SELECT name FROM test2 WHERE major = <replaceable>constant</replaceable> OR mino
283284 <para>
284285 Multicolumn indexes should be used sparingly. Most of the time,
285286 an index on a single column is sufficient and saves space and time.
286- Indexes with more than three columns are almost certainly
287- inappropriate .
287+ Indexes with more than three columns are unlikely to be helpful
288+ unless the usage of the table is extremely stylized .
288289 </para>
289290 </sect1>
290291
@@ -332,19 +333,19 @@ CREATE UNIQUE INDEX <replaceable>name</replaceable> ON <replaceable>table</repla
332333 </sect1>
333334
334335
335- <sect1 id="indexes-functional ">
336- <title>Functional Indexes</title>
336+ <sect1 id="indexes-expressional ">
337+ <title>Indexes on Expressions </title>
337338
338- <indexterm zone="indexes-functional ">
339+ <indexterm zone="indexes-expressional ">
339340 <primary>indexes</primary>
340- <secondary>on functions </secondary>
341+ <secondary>on expressions </secondary>
341342 </indexterm>
342343
343344 <para>
344- For a <firstterm>functional index</firstterm>, an index is defined
345- on the result of a function applied to one or more columns of a
346- single table. Functional indexes can be used to obtain fast access
347- to data based on the result of function calls .
345+ An index column need not be just a column of the underlying table,
346+ but can be a function or scalar expression computed from one or
347+ more columns of the table. This feature is useful to obtain fast
348+ access to tables based on the results of computations .
348349 </para>
349350
350351 <para>
@@ -362,20 +363,29 @@ CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));
362363 </para>
363364
364365 <para>
365- The function in the index definition can take more than one
366- argument, but they must be table columns, not constants.
367- Functional indexes are always single-column (namely, the function
368- result) even if the function uses more than one input column; there
369- cannot be multicolumn indexes that contain function calls.
366+ As another example, if one often does queries like this:
367+ <programlisting>
368+ SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';
369+ </programlisting>
370+ then it might be worth creating an index like this:
371+ <programlisting>
372+ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
373+ </programlisting>
370374 </para>
371375
372- <tip>
373- <para>
374- The restrictions mentioned in the previous paragraph can easily be
375- worked around by defining a custom function to use in the index
376- definition that computes any desired result internally.
377- </para>
378- </tip>
376+ <para>
377+ The syntax of the <command>CREATE INDEX</> command normally requires
378+ writing parentheses around index expressions, as shown in the second
379+ example. The parentheses may be omitted when the expression is just
380+ a function call, as in the first example.
381+ </para>
382+
383+ <para>
384+ Index expressions are relatively expensive to maintain, since the
385+ derived expression(s) must be computed for each row upon insertion
386+ or whenever it is updated. Therefore they should be used only when
387+ queries that can use the index are very frequent.
388+ </para>
379389 </sect1>
380390
381391
@@ -391,8 +401,8 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
391401 The operator class identifies the operators to be used by the index
392402 for that column. For example, a B-tree index on the type <type>int4</type>
393403 would use the <literal>int4_ops</literal> class; this operator
394- class includes comparison functions for values of type <type>int4</type>. In
395- practice the default operator class for the column's data type is
404+ class includes comparison functions for values of type <type>int4</type>.
405+ In practice the default operator class for the column's data type is
396406 usually sufficient. The main point of having operator classes is
397407 that for some data types, there could be more than one meaningful
398408 ordering. For example, we might want to sort a complex-number data
@@ -427,24 +437,25 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
427437 <literal>name_pattern_ops</literal> support B-tree indexes on
428438 the types <type>text</type>, <type>varchar</type>,
429439 <type>char</type>, and <type>name</type>, respectively. The
430- difference to the ordinary operator classes is that the values
440+ difference from the ordinary operator classes is that the values
431441 are compared strictly character by character rather than
432442 according to the locale-specific collation rules. This makes
433443 these operator classes suitable for use by queries involving
434444 pattern matching expressions (<literal>LIKE</literal> or POSIX
435445 regular expressions) if the server does not use the standard
436- <quote>C</quote> locale. As an example, to index a
446+ <quote>C</quote> locale. As an example, you might index a
437447 <type>varchar</type> column like this:
438448<programlisting>
439449CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
440450</programlisting>
441- If you do use the C locale, you should instead create an index
442- with the default operator class. Also note that you should
451+ If you do use the C locale, you may instead create an index
452+ with the default operator class, and it will still be useful
453+ for pattern-matching queries. Also note that you should
443454 create an index with the default operator class if you want
444455 queries involving ordinary comparisons to use an index. Such
445456 queries cannot use the
446457 <literal><replaceable>xxx</replaceable>_pattern_ops</literal>
447- operator classes. It is possible, however, to create multiple
458+ operator classes. It is allowed to create multiple
448459 indexes on the same column with different operator classes.
449460 </para>
450461 </listitem>
0 commit comments