1515 Using the locale features of the operating system to provide
1616 locale-specific collation order, number formatting, translated
1717 messages, and other aspects.
18+ This is covered in <xref linkend="locale"> and
19+ <xref linkend="collation">.
1820 </para>
1921 </listitem>
2022
2325 Providing a number of different character sets to support storing text
2426 in all kinds of languages, and providing character set translation
2527 between client and server.
28+ This is covered in <xref linkend="multibyte">.
2629 </para>
2730 </listitem>
2831 </itemizedlist>
@@ -138,9 +141,12 @@ initdb --locale=sv_SE
138141 fixed when the database is created. You can use different settings
139142 for different databases, but once a database is created, you cannot
140143 change them for that database anymore. <literal>LC_COLLATE</literal>
141- and <literal>LC_CTYPE</literal> are these type of categories. They affect
144+ and <literal>LC_CTYPE</literal> are these categories. They affect
142145 the sort order of indexes, so they must be kept fixed, or indexes on
143- text columns would become corrupt. The default values for these
146+ text columns would become corrupt.
147+ (But you can alleviate this restriction using collations, as discussed
148+ in <xref linkend="collation">.)
149+ The default values for these
144150 categories are determined when <command>initdb</command> is run, and
145151 those values are used when new databases are created, unless
146152 specified otherwise in the <command>CREATE DATABASE</command> command.
@@ -153,7 +159,7 @@ initdb --locale=sv_SE
153159 linkend="runtime-config-client-format"> for details). The values
154160 that are chosen by <command>initdb</command> are actually only written
155161 into the configuration file <filename>postgresql.conf</filename> to
156- serve as defaults when the server is started. If you disable these
162+ serve as defaults when the server is started. If you remove these
157163 assignments from <filename>postgresql.conf</filename> then the
158164 server will inherit the settings from its execution environment.
159165 </para>
@@ -308,66 +314,69 @@ initdb --locale=sv_SE
308314 <title>Collation Support</title>
309315
310316 <para>
311- The collation support allows specifying the sort order and certain
312- other locale aspects of data per column or per operation at run
313- time. This alleviates the problem that the
317+ The collation feature allows specifying the sort order and certain
318+ other locale aspects of data per- column, or even per- operation.
319+ This alleviates the restriction that the
314320 <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings
315321 of a database cannot be changed after its creation.
316322 </para>
317323
318324 <note>
319325 <para>
320- The collation support feature is currently only known to work on
321- Linux/ glibc and Mac OS X platforms.
326+ Collation support is currently only known to work on
327+ Linux ( glibc) and Mac OS X platforms.
322328 </para>
323329 </note>
324330
325331 <sect2>
326332 <title>Concepts</title>
327333
328334 <para>
329- Conceptually, every datum of a collatable data type has a
330- collation. (Collatable data types in the base system are
335+ Conceptually, every expression of a collatable data type has a
336+ collation. (The built- in collatable data types are
331337 <type>text</type>, <type>varchar</type>, and <type>char</type>.
332338 User-defined base types can also be marked collatable.) If the
333- datum is a column reference, the collation of the datum is the
334- defined collation of the column. If the datum is a constant, the
339+ expression is a column reference, the collation of the expression is the
340+ defined collation of the column. If the expression is a constant, the
335341 collation is the default collation of the data type of the
336- constant. The collation of more complex expressions is derived
337- from the input collations as described below.
342+ constant. The collation of a more complex expression is derived
343+ from the collations of its inputs, as described below.
338344 </para>
339345
340346 <para>
341- The collation of a datum can also be the <quote>default</quote>
342- collation, which reverts to the locale settings defined for the
343- database. In some cases, a datum can also have no known
347+ The collation of an expression can be the <quote>default</quote>
348+ collation, which means the locale settings defined for the
349+ database. In some cases, an expression can also have no known
344350 collation. In such cases, ordering operations and other
345351 operations that need to know the collation will fail.
346352 </para>
347353
348354 <para>
349355 When the database system has to perform an ordering or a
350- comparison, it considers the collation of the input data . This
351- happens in two situations: an <literal>ORDER BY</literal> clause
352- and a function or operator call such as <literal><</literal>.
353- The collation to apply for the performance of the <literal>ORDER
354- BY</literal> clause is simply the collation of the sort key. The
355- collation to apply for a function or operator call is derived from
356- the arguments, as described below. Additionally , collations are
357- taken into account by functions that convert between lower and
358- upper case letters, that is, <function>lower</function>,
359- <function>upper</function>, and <function> initcap</function >.
356+ comparison, it uses the collation of the input expression . This
357+ happens, for example, with <literal>ORDER BY</literal> clauses
358+ and function or operator calls such as <literal><</literal>.
359+ The collation to apply for an <literal>ORDER BY</literal> clause
360+ is simply the collation of the sort key. The collation to apply for a
361+ function or operator call is derived from the arguments, as described
362+ below. In addition to comparison operators , collations are taken into
363+ account by functions that convert between lower and upper case
364+ letters, such as <function>lower</>, < function>upper</>, and
365+ <function>initcap</>.
360366 </para>
361367
362368 <para>
363- For a function call, the collation that is derived from combining
364- the argument collations is both used for performing any
365- comparisons or ordering and for the collation of the function
366- result, if the result type is collatable.
369+ For a function or operator call, the collation that is derived by
370+ examining the argument collations is used at run time for performing
371+ the specified operation. If the result of the function or operator
372+ call is of a collatable data type, the collation is also used at parse
373+ time as the defined collation of the function or operator expression,
374+ in case there is a surrounding expression that requires knowledge of
375+ its collation.
367376 </para>
368377
369378 <para>
370- The <firstterm>collation derivation</firstterm> of a datum can be
379+ The <firstterm>collation derivation</firstterm> of an expression can be
371380 implicit or explicit. This distinction affects how collations are
372381 combined when multiple different collations appear in an
373382 expression. An explicit collation derivation arises when a
@@ -379,18 +388,18 @@ initdb --locale=sv_SE
379388 <orderedlist>
380389 <listitem>
381390 <para>
382- If any input item has an explicit collation derivation, then
383- all explicitly derived collations among the input items must be
384- the same, otherwise an error is raised. If an explicitly
391+ If any input expression has an explicit collation derivation, then
392+ all explicitly derived collations among the input expressions must be
393+ the same, otherwise an error is raised. If any explicitly
385394 derived collation is present, that is the result of the
386395 collation combination.
387396 </para>
388397 </listitem>
389398
390399 <listitem>
391400 <para>
392- Otherwise, all input items must have the same implicit
393- collation derivation or the default collation. If an
401+ Otherwise, all input expressions must have the same implicit
402+ collation derivation or the default collation. If any
394403 implicitly derived collation is present, that is the result of
395404 the collation combination. Otherwise, the result is the
396405 default collation.
@@ -428,19 +437,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
428437 A collation is an SQL schema object that maps an SQL name to
429438 operating system locales. In particular, it maps to a combination
430439 of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>. (As
431- the name would indicate , the main purpose of a collation is to set
440+ the name would suggest , the main purpose of a collation is to set
432441 <symbol>LC_COLLATE</symbol>, which controls the sort order. But
433442 it is rarely necessary in practice to have an
434443 <symbol>LC_CTYPE</symbol> setting that is different from
435444 <symbol>LC_COLLATE</symbol>, so it is more convenient to collect
436445 these under one concept than to create another infrastructure for
437- setting <symbol>LC_CTYPE</symbol> per datum .) Also, a collation
438- is tied to a character encoding. The same collation name may
439- exist for different encodings.
446+ setting <symbol>LC_CTYPE</symbol> per expression .) Also, a collation
447+ is tied to a character set encoding (see <xref linkend="multibyte">).
448+ The same collation name may exist for different encodings.
440449 </para>
441450
442451 <para>
443- When a database system is initialized, <command>initdb</command>
452+ When a database cluster is initialized, <command>initdb</command>
444453 populates the system catalog <literal>pg_collation</literal> with
445454 collations based on all the locales it finds on the operating
446455 system at the time. For example, the operating system might
@@ -463,8 +472,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
463472 collation may be created using
464473 the <xref linkend="sql-createcollation"> command. That command
465474 can also be used to create a new collation from an existing
466- collation, which can be useful to be able to use operating-system
467- independent collation names in applications.
475+ collation, which can be useful to be able to use
476+ operating-system-independent collation names in applications.
477+ </para>
478+
479+ <para>
480+ Within any particular database, only collations that use that
481+ database's encoding are of interest. Other entries in
482+ <literal>pg_collation</literal> are ignored. Thus, a stripped collation
483+ name such as <literal>de_DE</literal> can be considered unique
484+ within a given database even though it would not be unique globally.
485+ Use of the stripped collation names is recommendable, since it will
486+ make one less thing you need to change if you decide to change to
487+ another database encoding.
468488 </para>
469489 </sect2>
470490 </sect1>
0 commit comments