1- <!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.83 2007/04/15 10:56:25 ishii Exp $ -->
1+ <!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.84 2007/09/28 22:25:49 tgl Exp $ -->
22
33<chapter id="charset">
44 <title>Localization</>
@@ -249,7 +249,7 @@ initdb --locale=sv_SE
249249 <title>Problems</>
250250
251251 <para>
252- If locale support doesn't work in spite of the explanation above,
252+ If locale support doesn't work according to the explanation above,
253253 check that the locale support in your operating system is
254254 correctly configured. To check what locales are installed on your
255255 system, you can use the command <literal>locale -a</literal> if
@@ -301,7 +301,8 @@ initdb --locale=sv_SE
301301
302302 <para>
303303 The character set support in <productname>PostgreSQL</productname>
304- allows you to store text in a variety of character sets, including
304+ allows you to store text in a variety of character sets (also called
305+ encodings), including
305306 single-byte character sets such as the ISO 8859 series and
306307 multiple-byte character sets such as <acronym>EUC</> (Extended Unix
307308 Code), UTF-8, and Mule internal code. All supported character sets
@@ -314,6 +315,20 @@ initdb --locale=sv_SE
314315 databases each with a different character set.
315316 </para>
316317
318+ <para>
319+ An important restriction, however, is that each database character set
320+ must be compatible with the server's <envar>LC_CTYPE</> setting.
321+ When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any
322+ character set is allowed, but for other settings of <envar>LC_CTYPE</>
323+ there is only one character set that will work correctly.
324+ Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the
325+ apparent flexibility to use different encodings in different databases
326+ of a cluster is more theoretical than real, except when you select
327+ <literal>C</> or <literal>POSIX</> locale (thus disabling any real locale
328+ awareness). It is likely that these mechanisms will be revisited in future
329+ versions of <productname>PostgreSQL</productname>.
330+ </para>
331+
317332 <sect2 id="multibyte-charset-supported">
318333 <title>Supported Character Sets</title>
319334
@@ -716,7 +731,8 @@ initdb -E EUC_JP
716731 </para>
717732
718733 <para>
719- You can create a database with a different character set:
734+ If you have selected <literal>C</> or <literal>POSIX</> locale,
735+ you can create a database with a different character set:
720736
721737<screen>
722738createdb -E EUC_KR korean
@@ -731,7 +747,7 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR';
731747</programlisting>
732748
733749 The encoding for a database is stored in the system catalog
734- <literal>pg_database</literal>. You can see that by using the
750+ <literal>pg_database</literal>. You can see it by using the
735751 <option>-l</option> option or the <command>\l</command> command
736752 of <command>psql</command>.
737753
@@ -756,26 +772,23 @@ $ <userinput>psql -l</userinput>
756772
757773 <important>
758774 <para>
759- Although you can specify any encoding you want for a database, it is
760- unwise to choose an encoding that is not what is expected by the locale
761- you have selected. The <literal>LC_COLLATE</literal> and
762- <literal>LC_CTYPE</literal> settings imply a particular encoding,
763- and locale-dependent operations (such as sorting) are likely to
764- misinterpret data that is in an incompatible encoding.
765- </para>
766-
767- <para>
768- Since these locale settings are frozen by <command>initdb</>, the
769- apparent flexibility to use different encodings in different databases
770- of a cluster is more theoretical than real. It is likely that these
771- mechanisms will be revisited in future versions of
772- <productname>PostgreSQL</productname>.
775+ On most modern operating systems, <productname>PostgreSQL</productname>
776+ can determine which character set is implied by an <envar>LC_CTYPE</>
777+ setting, and it will enforce that only the correct database encoding is
778+ used. On older systems it is your responsibility to ensure that you use
779+ the encoding expected by the locale you have selected. A mistake in
780+ this area is likely to lead to strange misbehavior of locale-dependent
781+ operations such as sorting.
773782 </para>
774783
775784 <para>
776- One way to use multiple encodings safely is to set the locale to
777- <literal>C</> or <literal>POSIX</> during <command>initdb</>, thus
778- disabling any real locale awareness.
785+ <productname>PostgreSQL</productname> will allow superusers to create
786+ databases with <literal>SQL_ASCII</> encoding even when
787+ <envar>LC_CTYPE</> is not <literal>C</> or <literal>POSIX</>. As noted
788+ above, <literal>SQL_ASCII</> does not enforce that the data stored in
789+ the database has any particular encoding, and so this choice poses risks
790+ of locale-dependent misbehavior. Using this combination of settings is
791+ deprecated and may someday be forbidden altogether.
779792 </para>
780793 </important>
781794 </sect2>
0 commit comments