1- <!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.95 2009/05/18 08:59:28 petere Exp $ -->
1+ <!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.96 2010/02/03 17:25:05 momjian Exp $ -->
22
33<chapter id="charset">
44 <title>Localization</>
55
66 <para>
77 This chapter describes the available localization features from the
88 point of view of the administrator.
9- <productname>PostgreSQL</productname> supports localization with
10- two approaches :
9+ <productname>PostgreSQL</productname> supports two localization
10+ facilities :
1111
1212 <itemizedlist>
1313 <listitem>
@@ -67,10 +67,10 @@ initdb --locale=sv_SE
6767 (<literal>sv</>) as spoken
6868 in Sweden (<literal>SE</>). Other possibilities might be
6969 <literal>en_US</> (U.S. English) and <literal>fr_CA</> (French
70- Canadian). If more than one character set can be useful for a
70+ Canadian). If more than one character set can be used for a
7171 locale then the specifications look like this:
72- <literal>cs_CZ.ISO8859-2</>. What locales are available under what
73- names on your system depends on what was provided by the operating
72+ <literal>cs_CZ.ISO8859-2</>. What locales are available on your
73+ system under what names depends on what was provided by the operating
7474 system vendor and what was installed. On most Unix systems, the command
7575 <literal>locale -a</> will provide a list of available locales.
7676 Windows uses more verbose locale names, such as <literal>German_Germany</>
@@ -80,8 +80,8 @@ initdb --locale=sv_SE
8080 <para>
8181 Occasionally it is useful to mix rules from several locales, e.g.,
8282 use English collation rules but Spanish messages. To support that, a
83- set of locale subcategories exist that control only a certain
84- aspect of the localization rules:
83+ set of locale subcategories exist that control only certain
84+ aspects of the localization rules:
8585
8686 <informaltable>
8787 <tgroup cols="2">
@@ -127,13 +127,13 @@ initdb --locale=sv_SE
127127 </para>
128128
129129 <para>
130- The nature of some locale categories is that their value has to be
130+ Some locale categories must have their values
131131 fixed when the database is created. You can use different settings
132132 for different databases, but once a database is created, you cannot
133133 change them for that database anymore. <literal>LC_COLLATE</literal>
134- and <literal>LC_CTYPE</literal> are these categories. They affect
134+ and <literal>LC_CTYPE</literal> are these type of categories. They affect
135135 the sort order of indexes, so they must be kept fixed, or indexes on
136- text columns will become corrupt. The default values for these
136+ text columns would become corrupt. The default values for these
137137 categories are determined when <command>initdb</command> is run, and
138138 those values are used when new databases are created, unless
139139 specified otherwise in the <command>CREATE DATABASE</command> command.
@@ -146,7 +146,7 @@ initdb --locale=sv_SE
146146 linkend="runtime-config-client-format"> for details). The values
147147 that are chosen by <command>initdb</command> are actually only written
148148 into the configuration file <filename>postgresql.conf</filename> to
149- serve as defaults when the server is started. If you delete these
149+ serve as defaults when the server is started. If you disable these
150150 assignments from <filename>postgresql.conf</filename> then the
151151 server will inherit the settings from its execution environment.
152152 </para>
@@ -178,7 +178,7 @@ initdb --locale=sv_SE
178178 settings for the purpose of setting the language of messages. If
179179 in doubt, please refer to the documentation of your operating
180180 system, in particular the documentation about
181- <application>gettext</>, for more information .
181+ <application>gettext</>.
182182 </para>
183183 </note>
184184
@@ -320,8 +320,9 @@ initdb --locale=sv_SE
320320
321321 <para>
322322 An important restriction, however, is that each database's character set
323- must be compatible with the database's <envar>LC_CTYPE</> and
324- <envar>LC_COLLATE</> locale settings. For <literal>C</> or
323+ must be compatible with the database's <envar>LC_CTYPE</> (character
324+ classification) and <envar>LC_COLLATE</> (string sort order) locale
325+ settings. For <literal>C</> or
325326 <literal>POSIX</> locale, any character set is allowed, but for other
326327 locales there is only one character set that will work correctly.
327328 (On Windows, however, UTF-8 encoding can be used with any locale.)
@@ -543,7 +544,7 @@ initdb --locale=sv_SE
543544 <entry>LATIN1 with Euro and accents</entry>
544545 <entry>Yes</entry>
545546 <entry>1</entry>
546- <entry>ISO885915</entry>
547+ <entry><literal> ISO885915</> </entry>
547548 </row>
548549 <row>
549550 <entry><literal>LATIN10</literal></entry>
@@ -694,7 +695,7 @@ initdb --locale=sv_SE
694695 </table>
695696
696697 <para>
697- Not all <acronym>API</>s support all the listed character sets. For example, the
698+ Not all client <acronym>API</>s support all the listed character sets. For example, the
698699 <productname>PostgreSQL</>
699700 JDBC driver does not support <literal>MULE_INTERNAL</>, <literal>LATIN6</>,
700701 <literal>LATIN8</>, and <literal>LATIN10</>.
@@ -710,7 +711,7 @@ initdb --locale=sv_SE
710711 much a declaration that a specific encoding is in use, as a declaration
711712 of ignorance about the encoding. In most cases, if you are
712713 working with any non-ASCII data, it is unwise to use the
713- <literal>SQL_ASCII</> setting, because
714+ <literal>SQL_ASCII</> setting because
714715 <productname>PostgreSQL</productname> will be unable to help you by
715716 converting or validating non-ASCII characters.
716717 </para>
@@ -720,17 +721,17 @@ initdb --locale=sv_SE
720721 <title>Setting the Character Set</title>
721722
722723 <para>
723- <command>initdb</> defines the default character set
724+ <command>initdb</> defines the default character set (encoding)
724725 for a <productname>PostgreSQL</productname> cluster. For example,
725726
726727<screen>
727728initdb -E EUC_JP
728729</screen>
729730
730- sets the default character set (encoding) to
731+ sets the default character set to
731732 <literal>EUC_JP</literal> (Extended Unix Code for Japanese). You
732733 can use <option>--encoding</option> instead of
733- <option>-E</option> if you prefer to type longer option strings.
734+ <option>-E</option> if you prefer longer option strings.
734735 If no <option>-E</> or <option>--encoding</option> option is
735736 given, <command>initdb</> attempts to determine the appropriate
736737 encoding to use based on the specified or default locale.
@@ -762,8 +763,8 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE=
762763 <para>
763764 The encoding for a database is stored in the system catalog
764765 <literal>pg_database</literal>. You can see it by using the
765- <option>-l</option> option or the <command>\l</command> command
766- of <command>psql </command>.
766+ <command>psql</command> < option>-l</option> option or the
767+ <command>\l </command> command .
767768
768769<screen>
769770$ <userinput>psql -l</userinput>
@@ -784,11 +785,11 @@ $ <userinput>psql -l</userinput>
784785 <important>
785786 <para>
786787 On most modern operating systems, <productname>PostgreSQL</productname>
787- can determine which character set is implied by an <envar>LC_CTYPE</>
788+ can determine which character set is implied by the <envar>LC_CTYPE</>
788789 setting, and it will enforce that only the matching database encoding is
789790 used. On older systems it is your responsibility to ensure that you use
790791 the encoding expected by the locale you have selected. A mistake in
791- this area is likely to lead to strange misbehavior of locale-dependent
792+ this area is likely to lead to strange behavior of locale-dependent
792793 operations such as sorting.
793794 </para>
794795
@@ -1190,9 +1191,9 @@ RESET client_encoding;
11901191 <para>
11911192 If the conversion of a particular character is not possible
11921193 — suppose you chose <literal>EUC_JP</literal> for the
1193- server and <literal>LATIN1</literal> for the client, then some
1194- Japanese characters do not have a representation in
1195- <literal>LATIN1</literal> — then an error is reported.
1194+ server and <literal>LATIN1</literal> for the client, and some
1195+ Japanese characters are returned that do not have a representation in
1196+ <literal>LATIN1</literal> — an error is reported.
11961197 </para>
11971198
11981199 <para>
@@ -1249,7 +1250,8 @@ RESET client_encoding;
12491250
12501251 <listitem>
12511252 <para>
1252- <acronym>UTF</acronym>-8 is defined here.
1253+ <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation
1254+ Format) is defined here.
12531255 </para>
12541256 </listitem>
12551257 </varlistentry>
0 commit comments