@@ -377,10 +377,13 @@ initdb --locale-provider=icu --icu-locale=en
377377 variants and customization options.
378378 </para>
379379 </sect2>
380+
380381 <sect2 id="icu-locales">
381382 <title>ICU Locales</title>
383+
382384 <sect3 id="icu-locale-names">
383385 <title>ICU Locale Names</title>
386+
384387 <para>
385388 The ICU format for the locale name is a <link
386389 linkend="icu-language-tag">Language Tag</link>.
@@ -412,16 +415,19 @@ NOTICE: using standard form "de-DE" for locale "de_DE.utf8"
412415 linkend="icu-language-tag">language tag</link> instead of relying on the
413416 transformation.
414417 </para>
418+
415419 <para>
416420 A locale with no language name, or the special language name
417421 <literal>root</literal>, is transformed to have the language
418422 <literal>und</literal> ("undefined").
419423 </para>
424+
420425 <para>
421426 ICU can transform most libc locale names, as well as some other formats,
422427 into language tags for easier transition to ICU. If a libc locale name is
423428 used in ICU, it may not have precisely the same behavior as in libc.
424429 </para>
430+
425431 <para>
426432 If there is a problem interpreting the locale name, or if the locale name
427433 represents a language or region that ICU does not recognize, you will see
@@ -442,10 +448,12 @@ CREATE COLLATION
442448
443449 <sect3 id="icu-language-tag">
444450 <title>Language Tag</title>
451+
445452 <para>
446453 A language tag, defined in BCP 47, is a standardized identifier used to
447454 identify languages, regions, and other information about a locale.
448455 </para>
456+
449457 <para>
450458 Basic language tags are simply
451459 <replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
@@ -457,13 +465,15 @@ CREATE COLLATION
457465 <literal>ja-JP</literal>, <literal>de</literal>, or
458466 <literal>fr-CA</literal>.
459467 </para>
468+
460469 <para>
461470 Collation settings may be included in the language tag to customize
462471 collation behavior. ICU allows extensive customization, such as
463472 sensitivity (or insensitivity) to accents, case, and punctuation;
464473 treatment of digits within text; and many other options to satisfy a
465474 variety of uses.
466475 </para>
476+
467477 <para>
468478 To include this additional collation information in a language tag,
469479 append <literal>-u</literal>, which indicates there are additional
@@ -477,6 +487,7 @@ CREATE COLLATION
477487 <literal>-</literal><replaceable>value</replaceable>, which implies a
478488 value of <literal>true</literal>.
479489 </para>
490+
480491 <para>
481492 For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
482493 means the locale with the English language in the US region, with
@@ -500,13 +511,15 @@ SELECT 'N-45' < 'N-123' COLLATE mycollation5 as result;
500511(1 row)
501512</screen>
502513 </para>
514+
503515 <para>
504516 See <xref linkend="icu-custom-collations"/> for details and additional
505517 examples of using language tags with custom collation information for the
506518 locale.
507519 </para>
508520 </sect3>
509521 </sect2>
522+
510523 <sect2 id="locale-problems">
511524 <title>Problems</title>
512525
@@ -1100,6 +1113,7 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
11001113 </tip>
11011114 </sect3>
11021115 </sect2>
1116+
11031117 <sect2 id="icu-custom-collations">
11041118 <title>ICU Custom Collations</title>
11051119
@@ -1129,23 +1143,26 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
11291143 linkend="icu-collation-settings"/>, or see <xref
11301144 linkend="icu-external-references"/> for more details.
11311145 </para>
1146+
11321147 <sect3 id="icu-collation-comparison-levels">
11331148 <title>ICU Comparison Levels</title>
1149+
11341150 <para>
11351151 Comparison of two strings (collation) in ICU is determined by a
11361152 multi-level process, where textual features are grouped into
11371153 "levels". Treatment of each level is controlled by the <link
11381154 linkend="icu-collation-settings-table">collation settings</link>. Higher
11391155 levels correspond to finer textual features.
11401156 </para>
1157+
11411158 <para>
11421159 <xref linkend="icu-collation-levels"/> shows which textual feature
11431160 differences are considered significant when determining equality at the
11441161 given level. The unicode character <literal>U+2063</literal> is an
11451162 invisible separator, and as seen in the table, is ignored for at all
11461163 levels of comparison less than <literal>identic</literal>.
11471164 </para>
1148- <para>
1165+
11491166 <table id="icu-collation-levels">
11501167 <title>ICU Collation Levels</title>
11511168 <tgroup cols="8">
@@ -1157,6 +1174,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
11571174 <colspec colname="col6" colwidth="1*"/>
11581175 <colspec colname="col7" colwidth="1*"/>
11591176 <colspec colname="col8" colwidth="1*"/>
1177+
11601178 <thead>
11611179 <row>
11621180 <entry>Level</entry>
@@ -1169,6 +1187,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
11691187 <entry><literal>'y' = 'z'</literal></entry>
11701188 </row>
11711189 </thead>
1190+
11721191 <tbody>
11731192 <row>
11741193 <entry>level1</entry>
@@ -1224,6 +1243,7 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
12241243 </tgroup>
12251244 </table>
12261245
1246+ <para>
12271247 At every level, even with full normalization off, basic normalization is
12281248 performed. For example, <literal>'á'</literal> may be composed of the
12291249 code points <literal>U&'\0061\0301'</literal> or the single code
@@ -1233,9 +1253,9 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
12331253 created with <symbol>deterministic</symbol> set to
12341254 <literal>true</literal>.
12351255 </para>
1256+
12361257 <sect4 id="icu-collation-level-examples">
12371258 <title>Collation Level Examples</title>
1238- <para>
12391259
12401260<programlisting>
12411261CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
@@ -1251,25 +1271,26 @@ SELECT 'x-y' = 'x_y' COLLATE level3; -- true
12511271SELECT 'x-y' = 'x_y' COLLATE level4; -- false
12521272</programlisting>
12531273
1254- </para>
12551274 </sect4>
12561275 </sect3>
12571276
12581277 <sect3 id="icu-collation-settings">
12591278 <title>Collation Settings for an ICU Locale</title>
1279+
12601280 <para>
12611281 <xref linkend="icu-collation-settings-table"/> shows the available
12621282 collation settings, which can be used as part of a language tag to
12631283 customize a collation.
12641284 </para>
1265- <para>
1285+
12661286 <table id="icu-collation-settings-table">
12671287 <title>ICU Collation Settings</title>
12681288 <tgroup cols="4">
12691289 <colspec colname="col1" colwidth="1*"/>
12701290 <colspec colname="col2" colwidth="2*"/>
12711291 <colspec colname="col3" colwidth="2*"/>
12721292 <colspec colname="col4" colwidth="5*"/>
1293+
12731294 <thead>
12741295 <row>
12751296 <entry>Key</entry>
@@ -1278,6 +1299,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
12781299 <entry>Description</entry>
12791300 </row>
12801301 </thead>
1302+
12811303 <tbody>
12821304 <row>
12831305 <entry><literal>co</literal></entry>
@@ -1287,6 +1309,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
12871309 Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
12881310 </entry>
12891311 </row>
1312+
12901313 <row>
12911314 <entry><literal>ka</literal></entry>
12921315 <entry><literal>noignore</literal>, <literal>shifted</literal></entry>
@@ -1299,6 +1322,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
12991322 character classes are ignored.
13001323 </entry>
13011324 </row>
1325+
13021326 <row>
13031327 <entry><literal>kb</literal></entry>
13041328 <entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1309,6 +1333,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
13091333 before <literal>'aé'</literal>.
13101334 </entry>
13111335 </row>
1336+
13121337 <row>
13131338 <entry><literal>kc</literal></entry>
13141339 <entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1325,6 +1350,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
13251350 </para>
13261351 </entry>
13271352 </row>
1353+
13281354 <row>
13291355 <entry><literal>kf</literal></entry>
13301356 <entry>
@@ -1339,6 +1365,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
13391365 the rules of the locale.
13401366 </entry>
13411367 </row>
1368+
13421369 <row>
13431370 <entry><literal>kn</literal></entry>
13441371 <entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1350,6 +1377,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
13501377 <literal>'id-123'</literal>.
13511378 </entry>
13521379 </row>
1380+
13531381 <row>
13541382 <entry><literal>kk</literal></entry>
13551383 <entry><literal>true</literal>, <literal>false</literal></entry>
@@ -1373,6 +1401,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
13731401 </para>
13741402 </entry>
13751403 </row>
1404+
13761405 <row>
13771406 <entry><literal>kr</literal></entry>
13781407 <entry>
@@ -1398,6 +1427,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
13981427 </para>
13991428 </entry>
14001429 </row>
1430+
14011431 <row>
14021432 <entry><literal>ks</literal></entry>
14031433 <entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
@@ -1409,6 +1439,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
14091439 <xref linkend="icu-collation-levels"/> for details.
14101440 </entry>
14111441 </row>
1442+
14121443 <row>
14131444 <entry><literal>kv</literal></entry>
14141445 <entry>
@@ -1429,10 +1460,13 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
14291460 </tbody>
14301461 </tgroup>
14311462 </table>
1432- Defaults may depend on locale. The above table is not meant to be
1433- complete. See <xref linkend="icu-external-references"/> for additional
1434- options and details.
1463+
1464+ <para>
1465+ Defaults may depend on locale. The above table is not meant to be
1466+ complete. See <xref linkend="icu-external-references"/> for additional
1467+ options and details.
14351468 </para>
1469+
14361470 <note>
14371471 <para>
14381472 For many collation settings, you must create the collation with
@@ -1448,7 +1482,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
14481482
14491483 <sect3 id="icu-locale-examples">
14501484 <title>Examples</title>
1451- <para>
1485+
14521486 <variablelist>
14531487 <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
14541488 <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
@@ -1494,22 +1528,21 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
14941528 </listitem>
14951529 </varlistentry>
14961530 </variablelist>
1497- </para>
14981531 </sect3>
14991532
15001533 <sect3 id="icu-external-references">
15011534 <title>External References for ICU</title>
1535+
15021536 <para>
15031537 This section (<xref linkend="icu-custom-collations"/>) is only a brief
15041538 overview of ICU behavior and language tags. Refer to the following
15051539 documents for technical details, additional options, and new behavior:
15061540 </para>
1541+
15071542 <itemizedlist>
15081543 <listitem>
15091544 <para>
1510- <ulink
1511- url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
1512- Technical Standard #35</ulink>
1545+ <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink>
15131546 </para>
15141547 </listitem>
15151548 <listitem>
@@ -1519,8 +1552,7 @@ SELECT 'x-y' = 'x_y' COLLATE level4; -- false
15191552 </listitem>
15201553 <listitem>
15211554 <para>
1522- <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
1523- repository</ulink>
1555+ <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink>
15241556 </para>
15251557 </listitem>
15261558 <listitem>
0 commit comments