I have this old database from postgresql version 8.4 that my boss told me to try to convert from SQL_ASCII encoding to UTF8 encoding.
I tried to pg_dump --encoding=UTF8 and I got the message "invalid byte sequence for encoding “UTF8” : 0xe76122"
And searching more i tried pg_dump --encoding=ISO88591, and it worked, and I could also import it with no problems to the new empty UTF8 database that i created, but from time to time, i'm getting this message: "ERROR: character 0xc296 with encoding "UTF8" has no equivalent in "WIN1252";". Any solutions?
1 Answer
You made a mistake in converting the database.
It must have been encoded in WINDOWS-1252, not ISO 8859-1, and there must have been an “em dash” (Unicode U+2013, code point 96 in WINDOWS-1252).
When you dumped the database with encoding LATIN1 = ISO88591 and loaded it, the byte 0x96 was interpreted as Unicode U+0096, which is 0xC296 in UTF-8. This character does not exist in WINDOWS-1252, so the conversion fails.
You have to dump and restore the database again, but this time use
pg_dump --encoding=WIN1252
Since you also have code point 0x81 in your database, it could aso be WIN1251 (Cyrillic) or WIN1256 (Arabic). Or you have some wild mix – then you must fix the data before migration.
SQL_ASCIIessentially meant no encoding was enforced, so it is possible to have content in different encodings in a database. Second, what are you doing when you get the error message? Third, what happens if you import with--encoding=WIN1252?ISOis referring to? Given that the database is now using theUTF8encoding I would say theODBC UNICODEdriver would be the one to use.