@@ -515,7 +515,7 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
515515 <para>
516516 A collation object provided by <literal>libc</literal> maps to a
517517 combination of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>
518- settings. (As
518+ settings, as accepted by the <literal>setlocale()</literal> system library call . (As
519519 the name would suggest, the main purpose of a collation is to set
520520 <symbol>LC_COLLATE</symbol>, which controls the sort order. But
521521 it is rarely necessary in practice to have an
@@ -640,21 +640,19 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
640640 <title>ICU collations</title>
641641
642642 <para>
643- Collations provided by ICU are created with names in BCP 47 language tag
643+ With ICU, it is not sensible to enumerate all possible locale names. ICU
644+ uses a particular naming system for locales, but there are many more ways
645+ to name a locale than there are actually distinct locales.
646+ <command>initdb</command> uses the ICU APIs to extract a set of distinct
647+ locales to populate the initial set of collations. Collations provided by
648+ ICU are created in the SQL environment with names in BCP 47 language tag
644649 format, with a <quote>private use</quote>
645650 extension <literal>-x-icu</literal> appended, to distinguish them from
646- libc locales. So <literal>de-x-icu</literal> would be an example name.
651+ libc locales.
647652 </para>
648653
649654 <para>
650- With ICU, it is not sensible to enumerate all possible locale names. ICU
651- uses a particular naming system for locales, but there are many more ways
652- to name a locale than there are actually distinct locales. (In fact, any
653- string will be accepted as a locale name.)
654- See <ulink url="http://userguide.icu-project.org/locale"></ulink> for
655- information on ICU locale naming. <command>initdb</command> uses the ICU
656- APIs to extract a set of distinct locales to populate the initial set of
657- collations. Here are some example collations that might be created:
655+ Here are some example collations that might be created:
658656
659657 <variablelist>
660658 <varlistentry>
@@ -695,32 +693,104 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
695693 will draw an error along the lines of <quote>collation "de-x-icu" for
696694 encoding "WIN874" does not exist</>.
697695 </para>
696+ </sect4>
697+ </sect3>
698+
699+ <sect3 id="collation-create">
700+ <title>Creating New Collation Objects</title>
701+
702+ <para>
703+ If the standard and predefined collations are not sufficient, users can
704+ create their own collation objects using the SQL
705+ command <xref linkend="sql-createcollation">.
706+ </para>
707+
708+ <para>
709+ The standard and predefined collations are in the
710+ schema <literal>pg_catalog</literal>, like all predefined objects.
711+ User-defined collations should be created in user schemas. This also
712+ ensures that they are saved by <command>pg_dump</command>.
713+ </para>
714+
715+ <sect4>
716+ <title>libc collations</title>
717+
718+ <para>
719+ New libc collations can be created like this:
720+ <programlisting>
721+ CREATE COLLATION german (provider = libc, locale = 'de_DE');
722+ </programlisting>
723+ The exact values that are acceptable for the <literal>locale</literal>
724+ clause in this command depend on the operating system. On Unix-like
725+ systems, the command <literal>locale -a</literal> will show a list.
726+ </para>
727+
728+ <para>
729+ Since the predefined libc collations already include all collations
730+ defined in the operating system when the database instance is
731+ initialized, it is not often necessary to manually create new ones.
732+ Reasons might be if a different naming system is desired (in which case
733+ see also <xref linkend="collation-copy">) or if the operating system has
734+ been upgraded to provide new locale definitions (in which case see
735+ also <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link>).
736+ </para>
737+ </sect4>
738+
739+ <sect4>
740+ <title>ICU collations</title>
698741
699742 <para>
700743 ICU allows collations to be customized beyond the basic language+country
701744 set that is preloaded by <command>initdb</command>. Users are encouraged
702745 to define their own collation objects that make use of these facilities to
703- suit the sorting behavior to their requirements. Here are some examples:
746+ suit the sorting behavior to their requirements.
747+ See <ulink url="http://userguide.icu-project.org/locale"></ulink>
748+ and <ulink url="http://userguide.icu-project.org/collation/api"></ulink> for
749+ information on ICU locale naming. The set of acceptable names and
750+ attributes depends on the particular ICU version.
751+ </para>
752+
753+ <para>
754+ Here are some examples:
704755
705756 <variablelist>
706757 <varlistentry>
707- <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk')</literal></term>
758+ <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
759+ <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');</literal></term>
708760 <listitem>
709761 <para>German collation with phone book collation type</para>
762+ <para>
763+ The first example selects the ICU locale using a <quote>language
764+ tag</quote> per BCP 47. The second example uses the traditional
765+ ICU-specific locale syntax. The first style is preferred going
766+ forward, but it is not supported by older ICU versions.
767+ </para>
768+ <para>
769+ Note that you can name the collation objects in the SQL environment
770+ anything you want. In this example, we follow the naming style that
771+ the predefined collations use, which in turn also follow BCP 47, but
772+ that is not required for user-defined collations.
773+ </para>
710774 </listitem>
711775 </varlistentry>
712776
713777 <varlistentry>
714- <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji')</literal></term>
778+ <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
779+ <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');</literal></term>
715780 <listitem>
716781 <para>
717782 Root collation with Emoji collation type, per Unicode Technical Standard #51
718783 </para>
784+ <para>
785+ Observe how in the traditional ICU locale naming system, the root
786+ locale is selected by an empty string.
787+ </para>
719788 </listitem>
720789 </varlistentry>
721790
722791 <varlistentry>
723- <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit')</literal></term>
792+ <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');</literal></term>
793+ <term><literal>CREATE COLLATION digitslast (provider = icu, locale = 'en@colReorder=latn-digit');</literal></term>
724794 <listitem>
725795 <para>
726796 Sort digits after Latin letters. (The default is digits before letters.)
@@ -729,7 +799,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
729799 </varlistentry>
730800
731801 <varlistentry>
732- <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper')</literal></term>
802+ <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
803+ <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');</literal></term>
733804 <listitem>
734805 <para>
735806 Sort upper-case letters before lower-case letters. (The default is
@@ -739,7 +810,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
739810 </varlistentry>
740811
741812 <varlistentry>
742- <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit')</literal></term>
813+ <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit');</literal></term>
814+ <term><literal>CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=latn-digit');</literal></term>
743815 <listitem>
744816 <para>
745817 Combines both of the above options.
@@ -748,7 +820,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
748820 </varlistentry>
749821
750822 <varlistentry>
751- <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true')</literal></term>
823+ <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');</literal></term>
824+ <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');</literal></term>
752825 <listitem>
753826 <para>
754827 Numeric ordering, sorts sequences of digits by their numeric value,
@@ -768,7 +841,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
768841 repository</ulink>.
769842 The <ulink url="https://ssl.icu-project.org/icu-bin/locexp">ICU Locale
770843 Explorer</ulink> can be used to check the details of a particular locale
771- definition.
844+ definition. The examples using the <literal>k*</literal> subtags require
845+ at least ICU version 54.
772846 </para>
773847
774848 <para>
@@ -779,10 +853,21 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
779853 strings that compare equal according to the collation but are not
780854 byte-wise equal will be sorted according to their byte values.
781855 </para>
856+
857+ <note>
858+ <para>
859+ By design, ICU will accept almost any string as a locale name and match
860+ it to the closet locale it can provide, using the fallback procedure
861+ described in its documentation. Thus, there will be no direct feedback
862+ if a collation specification is composed using features that the given
863+ ICU installation does not actually support. It is therefore recommended
864+ to create application-level test cases to check that the collation
865+ definitions satisfy one's requirements.
866+ </para>
867+ </note>
782868 </sect4>
783- </sect3>
784869
785- <sect3 >
870+ <sect4 id="collation-copy" >
786871 <title>Copying Collations</title>
787872
788873 <para>
@@ -796,13 +881,7 @@ CREATE COLLATION german FROM "de_DE";
796881CREATE COLLATION french FROM "fr-x-icu";
797882</programlisting>
798883 </para>
799-
800- <para>
801- The standard and predefined collations are in the
802- schema <literal>pg_catalog</literal>, like all predefined objects.
803- User-defined collations should be created in user schemas. This also
804- ensures that they are saved by <command>pg_dump</command>.
805- </para>
884+ </sect4>
806885 </sect3>
807886 </sect2>
808887 </sect1>
0 commit comments