|
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.6 2010/08/25 02:12:00 tgl Exp $ --> |
| 2 | + |
1 | 3 | <sect1 id="unaccent"> |
2 | 4 | <title>unaccent</title> |
3 | 5 |
|
|
6 | 8 | </indexterm> |
7 | 9 |
|
8 | 10 | <para> |
9 | | - <filename>unaccent</> removes accents (diacritic signs) from a lexeme. |
10 | | - It's a filtering dictionary, that means its output is |
11 | | - always passed to the next dictionary (if any), contrary to the standard |
12 | | - behavior. Currently, it supports most important accents from European |
13 | | - languages. |
| 11 | + <filename>unaccent</> is a text search dictionary that removes accents |
| 12 | + (diacritic signs) from lexemes. |
| 13 | + It's a filtering dictionary, which means its output is |
| 14 | + always passed to the next dictionary (if any), unlike the normal |
| 15 | + behavior of dictionaries. This allows accent-insensitive processing |
| 16 | + for full text search. |
14 | 17 | </para> |
15 | 18 |
|
16 | 19 | <para> |
17 | | - Limitation: Current implementation of <filename>unaccent</> |
18 | | - dictionary cannot be used as a normalizing dictionary for |
19 | | - <filename>thesaurus</filename> dictionary. |
| 20 | + The current implementation of <filename>unaccent</> cannot be used as a |
| 21 | + normalizing dictionary for the <filename>thesaurus</filename> dictionary. |
20 | 22 | </para> |
21 | | - |
| 23 | + |
22 | 24 | <sect2> |
23 | 25 | <title>Configuration</title> |
24 | 26 |
|
25 | 27 | <para> |
26 | | - A <literal>unaccent</> dictionary accepts the following options: |
| 28 | + An <literal>unaccent</> dictionary accepts the following options: |
27 | 29 | </para> |
28 | 30 | <itemizedlist> |
29 | 31 | <listitem> |
|
43 | 45 | <itemizedlist> |
44 | 46 | <listitem> |
45 | 47 | <para> |
46 | | - Each line represents pair: character_with_accent character_without_accent |
| 48 | + Each line represents a pair, consisting of a character with accent |
| 49 | + followed by a character without accent. The first is translated into |
| 50 | + the second. For example, |
47 | 51 | <programlisting> |
48 | 52 | À A |
49 | 53 | Á A |
50 | | -Â A |
| 54 | +Â A |
51 | 55 | Ã A |
52 | | -Ä A |
53 | | -Å A |
54 | | -Æ A |
| 56 | +Ä A |
| 57 | +Å A |
| 58 | +Æ A |
55 | 59 | </programlisting> |
56 | 60 | </para> |
57 | 61 | </listitem> |
58 | 62 | </itemizedlist> |
59 | 63 |
|
60 | 64 | <para> |
61 | | - Look at <filename>unaccent.rules</>, which is installed in |
62 | | - <filename>$SHAREDIR/tsearch_data/</>, for an example. |
| 65 | + A more complete example, which is directly useful for most European |
| 66 | + languages, can be found in <filename>unaccent.rules</>, which is installed |
| 67 | + in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</> |
| 68 | + module is installed. |
63 | 69 | </para> |
64 | 70 | </sect2> |
65 | 71 |
|
66 | 72 | <sect2> |
67 | 73 | <title>Usage</title> |
68 | 74 |
|
69 | 75 | <para> |
70 | | - Running the installation script creates a text search template |
71 | | - <literal>unaccent</> and a dictionary <literal>unaccent</> |
| 76 | + Running the installation script <filename>unaccent.sql</> creates a text |
| 77 | + search template <literal>unaccent</> and a dictionary <literal>unaccent</> |
72 | 78 | based on it, with default parameters. You can alter the |
73 | 79 | parameters, for example |
74 | 80 |
|
75 | 81 | <programlisting> |
76 | | -=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); |
| 82 | +mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); |
77 | 83 | </programlisting> |
78 | 84 |
|
79 | 85 | or create new dictionaries based on the template. |
80 | 86 | </para> |
81 | 87 |
|
82 | 88 | <para> |
83 | | - To test the dictionary, you can try |
84 | | - |
| 89 | + To test the dictionary, you can try: |
85 | 90 | <programlisting> |
86 | | -=# select ts_lexize('unaccent','Hôtel'); |
87 | | - ts_lexize |
| 91 | +mydb=# select ts_lexize('unaccent','Hôtel'); |
| 92 | + ts_lexize |
88 | 93 | ----------- |
89 | 94 | {Hotel} |
90 | 95 | (1 row) |
91 | 96 | </programlisting> |
92 | 97 | </para> |
93 | | - |
| 98 | + |
94 | 99 | <para> |
95 | | - Filtering dictionary are useful for correct work of |
96 | | - <function>ts_headline</function> function. |
| 100 | + Here is an example showing how to insert the |
| 101 | + <filename>unaccent</> dictionary into a text search configuration: |
97 | 102 | <programlisting> |
98 | | -=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); |
99 | | -=# ALTER TEXT SEARCH CONFIGURATION fr |
| 103 | +mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); |
| 104 | +mydb=# ALTER TEXT SEARCH CONFIGURATION fr |
100 | 105 | ALTER MAPPING FOR hword, hword_part, word |
101 | 106 | WITH unaccent, french_stem; |
102 | | -=# select to_tsvector('fr','Hôtels de la Mer'); |
103 | | - to_tsvector |
| 107 | +mydb=# select to_tsvector('fr','Hôtels de la Mer'); |
| 108 | + to_tsvector |
104 | 109 | ------------------- |
105 | 110 | 'hotel':1 'mer':4 |
106 | 111 | (1 row) |
107 | 112 |
|
108 | | -=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); |
109 | | - ?column? |
| 113 | +mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); |
| 114 | + ?column? |
110 | 115 | ---------- |
111 | 116 | t |
112 | 117 | (1 row) |
113 | | -=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); |
114 | | - ts_headline |
| 118 | + |
| 119 | +mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); |
| 120 | + ts_headline |
115 | 121 | ------------------------ |
116 | | - <b>Hôtel</b>de la Mer |
| 122 | + <b>Hôtel</b> de la Mer |
117 | 123 | (1 row) |
118 | | - |
119 | 124 | </programlisting> |
120 | 125 | </para> |
121 | 126 | </sect2> |
122 | 127 |
|
123 | 128 | <sect2> |
124 | | - <title>Function</title> |
| 129 | + <title>Functions</title> |
125 | 130 |
|
126 | 131 | <para> |
127 | | - <function>unaccent</> function removes accents (diacritic signs) from |
128 | | - argument string. Basically, it's a wrapper around |
129 | | - <filename>unaccent</> dictionary. |
| 132 | + The <function>unaccent()</> function removes accents (diacritic signs) from |
| 133 | + a given string. Basically, it's a wrapper around the |
| 134 | + <filename>unaccent</> dictionary, but it can be used outside normal |
| 135 | + text search contexts. |
130 | 136 | </para> |
131 | 137 |
|
132 | 138 | <indexterm> |
133 | 139 | <primary>unaccent</primary> |
134 | 140 | </indexterm> |
135 | 141 |
|
136 | 142 | <synopsis> |
137 | | -unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) |
138 | | -returns <type>text</type> |
| 143 | +unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type> |
139 | 144 | </synopsis> |
140 | 145 |
|
141 | 146 | <para> |
| 147 | + For example: |
142 | 148 | <programlisting> |
143 | | -SELECT unaccent('unaccent', 'Hôtel'); |
144 | | -SELECT unaccent('Hôtel'); |
| 149 | +SELECT unaccent('unaccent', 'Hôtel'); |
| 150 | +SELECT unaccent('Hôtel'); |
145 | 151 | </programlisting> |
146 | 152 | </para> |
147 | 153 | </sect2> |
|
0 commit comments