1- <!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.240 2009/07/08 17:21:55 tgl Exp $ -->
1+ <!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.241 2009/08/04 16:08:35 tgl Exp $ -->
22
33 <chapter id="datatype">
44 <title id="datatype-title">Data Types</title>
@@ -1177,7 +1177,7 @@ SELECT b, char_length(b) FROM test2;
11771177 <para>
11781178 A binary string is a sequence of octets (or bytes). Binary
11791179 strings are distinguished from character strings in two
1180- ways: First, binary strings specifically allow storing
1180+ ways. First, binary strings specifically allow storing
11811181 octets of value zero and other <quote>non-printable</quote>
11821182 octets (usually, octets outside the range 32 to 126).
11831183 Character strings disallow zero octets, and also disallow any
@@ -1191,13 +1191,82 @@ SELECT b, char_length(b) FROM test2;
11911191 </para>
11921192
11931193 <para>
1194- When entering <type>bytea</type> values, octets of certain
1195- values <emphasis>must</emphasis> be escaped (but all octet
1196- values <emphasis>can</emphasis> be escaped) when used as part
1197- of a string literal in an <acronym>SQL</acronym> statement. In
1194+ The <type>bytea</type> type supports two external formats for
1195+ input and output: <productname>PostgreSQL</productname>'s historical
1196+ <quote>escape</quote> format, and <quote>hex</quote> format. Both
1197+ of these are always accepted on input. The output format depends
1198+ on the configuration parameter <xref linkend="guc-bytea-output">;
1199+ the default is hex. (Note that the hex format was introduced in
1200+ <productname>PostgreSQL</productname> 8.5; earlier versions and some
1201+ tools don't understand it.)
1202+ </para>
1203+
1204+ <para>
1205+ The <acronym>SQL</acronym> standard defines a different binary
1206+ string type, called <type>BLOB</type> or <type>BINARY LARGE
1207+ OBJECT</type>. The input format is different from
1208+ <type>bytea</type>, but the provided functions and operators are
1209+ mostly the same.
1210+ </para>
1211+
1212+ <sect2>
1213+ <title><type>bytea</> hex format</title>
1214+
1215+ <para>
1216+ The <quote>hex</> format encodes binary data as 2 hexadecimal digits
1217+ per byte, most significant nibble first. The entire string is
1218+ preceded by the sequence <literal>\x</literal> (to distinguish it
1219+ from the escape format). In some contexts, the initial backslash may
1220+ need to be escaped by doubling it, in the same cases in which backslashes
1221+ have to be doubled in escape format; details appear below.
1222+ The hexadecimal digits can
1223+ be either upper or lower case, and whitespace is permitted between
1224+ digit pairs (but not within a digit pair nor in the starting
1225+ <literal>\x</literal> sequence).
1226+ The hex format is compatible with a wide
1227+ range of external applications and protocols, and it tends to be
1228+ faster to convert than the escape format, so its use is preferred.
1229+ </para>
1230+
1231+ <para>
1232+ Example:
1233+ <programlisting>
1234+ SELECT E'\\xDEADBEEF';
1235+ </programlisting>
1236+ </para>
1237+ </sect2>
1238+
1239+ <sect2>
1240+ <title><type>bytea</> escape format</title>
1241+
1242+ <para>
1243+ The <quote>escape</quote> format is the traditional
1244+ <productname>PostgreSQL</productname> format for the <type>bytea</type>
1245+ type. It
1246+ takes the approach of representing a binary string as a sequence
1247+ of ASCII characters, while converting those bytes that cannot be
1248+ represented as an ASCII character into special escape sequences.
1249+ If, from the point of view of the application, representing bytes
1250+ as characters makes sense, then this representation can be
1251+ convenient. But in practice it is usually confusing becauses it
1252+ fuzzes up the distinction between binary strings and character
1253+ strings, and also the particular escape mechanism that was chosen is
1254+ somewhat unwieldy. So this format should probably be avoided
1255+ for most new applications.
1256+ </para>
1257+
1258+ <para>
1259+ When entering <type>bytea</type> values in escape format,
1260+ octets of certain
1261+ values <emphasis>must</emphasis> be escaped, while all octet
1262+ values <emphasis>can</emphasis> be escaped. In
11981263 general, to escape an octet, convert it into its three-digit
11991264 octal value and precede it
1200- by two backslashes. <xref linkend="datatype-binary-sqlesc">
1265+ by a backslash (or two backslashes, if writing the value as a
1266+ literal using escape string syntax).
1267+ Backslash itself (octet value 92) can alternatively be represented by
1268+ double backslashes.
1269+ <xref linkend="datatype-binary-sqlesc">
12011270 shows the characters that must be escaped, and gives the alternative
12021271 escape sequences where applicable.
12031272 </para>
@@ -1343,14 +1412,7 @@ SELECT b, char_length(b) FROM test2;
13431412 have to escape line feeds and carriage returns if your interface
13441413 automatically translates these.
13451414 </para>
1346-
1347- <para>
1348- The <acronym>SQL</acronym> standard defines a different binary
1349- string type, called <type>BLOB</type> or <type>BINARY LARGE
1350- OBJECT</type>. The input format is different from
1351- <type>bytea</type>, but the provided functions and operators are
1352- mostly the same.
1353- </para>
1415+ </sect2>
13541416 </sect1>
13551417
13561418
0 commit comments