(Not a duplicate of 4079956)
I have an SQL_ASCII database, LC_CTYPE=LC_COLLATION="C", which contains mostly ASCII data as well as some non-ASCII characters from some codepage, say LATIN1.
I want to transcode, in-place (no pg_dump/pg-restore), all non-ASCII codepoints from the LATIN1 codepage to UTF-8 then alter the database encoding to UTF-8, e.g.:
-- change encoding first, transcode data after
UPDATE pg_database SET encoding=pg_char_to_encoding('UTF8')
WHERE datname='sqlasciidb';
UPDATE tbl SET str=convert_from(str::bytea, 'LATIN1')
WHERE str::bytea<>convert_from(str::bytea, 'LATIN1')::bytea;
or
-- transcode data first, change encoding after
CREATE DOMAIN my_varlena AS bytea;
CREATE CAST (my_varlena AS text) WITHOUT FUNCTION;
UPDATE tbl SET str=convert(str::bytea, 'LATIN1','UTF8')::my_varlena::text
WHERE str::bytea<>convert(str::bytea, 'LATIN1', 'UTF8');
DROP DOMAIN my_varlena CASCADE;
UPDATE pg_database SET encoding=pg_char_to_encoding('UTF8')
WHERE datname='sqlasciidb';
What, if anything, is wrong with the above approach?
Some problems I can see:
- after
pg_databaseis updated, all connections to the database should be closed and reopened for the backend to take into account the new encoding - all indexes based on the altered columns should be rebuilt
Anything else?