I've a Delphi 6 program (single byte characters) which sorts strings in a TStringList by the default case-insensitive AnsiCompareText function, which in turn calls the CompareStringA function in Windows kernel32.dll. (Regional settings are Hungarian.)
I'd like to do the same sorting in a PostgreSQL database, on a Kubuntu (linux-image-3.2.0-65-generic-pae, on 32 bit x86, KDE 4.8.5) system. It is created by
CREATE DATABASE <...>
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'hu_HU.UTF-8'
LC_CTYPE = 'hu_HU.UTF-8'
CONNECTION LIMIT = -1;
If I sort by C or POSIX, the accented characters are not sorted into their alphabetic order. If I sort by the default collation, spaces and some special characters are ignored. This is a problem when these occur at the beginning of the string. (Specifying the collation is easy since PostgreSQL 9.1: see http://www.postgresql.org/docs/9.3/static/collation.html.)
Several questions were asked in this topic, e.g. PostgreSQL Sort The answer there can't be generalized: it rules out the '@' at the first character position only.
My question is perhaps a duplicate of Is there any way to have PostgreSQL not collapse punctuation and spaces when collating using a language? The answer there directs to the TODO-list of PostgreSQL: http://wiki.postgresql.org/wiki/Todo:ICU Is there any change since then?
What I want is a collation which keeps spaces and special characters in their ASCII position, and sorts accented characters alphabetically - exactly as in Windows.
Do I have to write a custom locale (how)? Or a custom comparison function, written perhaps in Delphi (how do I add to PostgreSQL)? Or translating special characters to hexadecimal, for example - but then they will be sorted into the text. Translating ALL characters to hexadecimal (and mapping case and accent differences to the same code) seems terrible - it'd mean that I write the complete collation myself. I'm sure there should be a solution for this.