aboutsummaryrefslogtreecommitdiffstats
path: root/man7/unicode.7
diff options
context:
space:
mode:
authorMichael Kerrisk <mtk.manpages@gmail.com>2014-06-24 11:57:00 +0200
committerMichael Kerrisk <mtk.manpages@gmail.com>2014-06-24 11:58:39 +0200
commit9423e95b07d386022a4208aaf79840d4adf194d1 (patch)
treea8be2ab681ba30cb862ccbadbe4c649c480ec287 /man7/unicode.7
parent66676c9124d646284524c0c9b260c571da7e9221 (diff)
downloadman-pages-9423e95b07d386022a4208aaf79840d4adf194d1.tar.gz
unicode.7: Minor formatting fixes
There's no need really to boldface names of standards and character sets. Reported-by: Marko Myllynen <myllynen@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Diffstat (limited to 'man7/unicode.7')
-rw-r--r--man7/unicode.765
1 files changed, 21 insertions, 44 deletions
diff --git a/man7/unicode.7 b/man7/unicode.7
index 4884b566c4..c616f50192 100644
--- a/man7/unicode.7
+++ b/man7/unicode.7
@@ -30,13 +30,10 @@
.SH NAME
Unicode \- universal character set
.SH DESCRIPTION
-The international standard
-.B ISO 10646
-defines the
-.BR "Universal Character Set (UCS)" .
+The international standard ISO 10646 defines the
+Universal Character Set (UCS).
UCS contains all characters of all other character set standards.
-It also guarantees
-.BR "round-trip compatibility";
+It also guarantees "round-trip compatibility";
in other words,
conversion tables can be built such that no information is lost
when a string is converted from any other encoding to UCS and back.
@@ -74,14 +71,12 @@ made up of 256 8-bit
with 256
.I column
positions, one for each character.
-Part 1 of the standard
-.RB ( "ISO 10646-1" )
+Part 1 of the standard (ISO 10646-1)
defines the first 65534 code positions (0x0000 to 0xfffd), which form
the
.IR "Basic Multilingual Plane (BMP)" ,
that is plane 0 in group 0.
-Part 2 of the standard
-.RB ( "ISO 10646-2" )
+Part 2 of the standard (ISO 10646-2)
adds characters to group 0 outside the BMP in several
.I "supplementary planes"
in the range 0x10000 to 0x10ffff.
@@ -97,27 +92,20 @@ dictionary printing, publishing industry, higher-level protocol and
enthusiast needs.
.PP
The representation of each UCS character as a 2-byte word is referred
-to as the
-.B UCS-2
-form (only for BMP characters), whereas
-.B UCS-4
-is the representation of each character by a 4-byte word.
-In addition, there exist two encoding forms
-.B UTF-8
-for backward compatibility with ASCII processing software and
-.B UTF-16
+to as the UCS-2 form (only for BMP characters),
+whereas UCS-4 is the representation of each character by a 4-byte word.
+In addition, there exist two encoding forms UTF-8
+for backward compatibility with ASCII processing software and UTF-16
for the backward-compatible handling of non-BMP characters up to
0x10ffff by UCS-2 software.
.PP
The UCS characters 0x0000 to 0x007f are identical to those of the
-classic
-.B US-ASCII
+classic US-ASCII
character set and the characters in the range 0x0000 to 0x00ff
are identical to those in
-.BR "ISO 8859-1 Latin-1" .
+ISO 8859-1 (Latin-1).
.SS Combining characters
-Some code points in
-.B UCS
+Some code points in UCS
have been assigned to
.IR "combining characters" .
These are similar to the nonspacing accent keys on a typewriter.
@@ -143,8 +131,7 @@ combining characters, ISO 10646-1 specifies the following three
of UCS:
.TP 0.9i
Level 1
-Combining characters and
-.B Hangul Jamo
+Combining characters and Hangul Jamo
(a variant encoding of the Korean script, where a Hangul syllable
glyph is coded as a triplet or pair of vovel/consonant codes) are not
supported.
@@ -155,19 +142,13 @@ languages where they are essential (e.g., Thai, Lao, Hebrew,
Arabic, Devanagari, Malayalam).
.TP
Level 3
-All
-.B UCS
-characters are supported.
+All UCS characters are supported.
.PP
-The
-.B Unicode 3.0 Standard
-published by the
-.B Unicode Consortium
-contains exactly the
-.B UCS Basic Multilingual Plane
+The Unicode 3.0 Standard
+published by the Unicode Consortium
+contains exactly the UCS Basic Multilingual Plane
at implementation level 3, as described in ISO 10646-1:2000.
-.B Unicode 3.1
-added the supplemental planes of ISO 10646-2.
+Unicode 3.1 added the supplemental planes of ISO 10646-2.
The Unicode standard and
technical reports published by the Unicode Consortium provide much
additional information on the semantics and recommended usages of
@@ -180,8 +161,7 @@ Under GNU/Linux, the C type
.I wchar_t
is a signed 32-bit integer type.
Its values are always interpreted
-by the C library as
-.B UCS
+by the C library as UCS
code values (in all locales), a convention that is signaled by the GNU
C library to applications by defining the constant
.B __STDC_ISO_10646__
@@ -189,9 +169,7 @@ as specified in the ISO C99 standard.
UCS/Unicode can be used just like ASCII in input/output streams,
terminal communication, plaintext files, filenames, and environment
-variables in the ASCII compatible
-.B UTF-8
-multibyte encoding.
+variables in the ASCII compatible UTF-8 multibyte encoding.
To signal the use of UTF-8 as the character
encoding to all applications, a suitable
.I locale
@@ -236,8 +214,7 @@ Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
International Standard ISO/IEC 10646-1, International Organization
for Standardization, Geneva, 2000.
-This is the official specification of
-.BR UCS .
+This is the official specification of UCS .
Available from
.UR http://www.iso.ch/
.UE .