How to convert old postgresql db from ascii to utf8

Question

I have this old database from postgresql version 8.4 that my boss told me to try to convert from SQL_ASCII encoding to UTF8 encoding. I tried to pg_dump --encoding=UTF8 and I got the message "invalid byte sequence for encoding “UTF8” : 0xe76122" And searching more i tried pg_dump --encoding=ISO88591, and it worked, and I could also import it with no problems to the new empty UTF8 database that i created, but from time to time, i'm getting this message: "ERROR: character 0xc296 with encoding "UTF8" has no equivalent in "WIN1252";". Any solutions?

First an encoding of SQL_ASCII essentially meant no encoding was enforced, so it is possible to have content in different encodings in a database. Second, what are you doing when you get the error message? Third, what happens if you import with --encoding=WIN1252? — Adrian Klaver
– Adrian Klaver, Commented Mar 31, 2021 at 19:24
So, for the 1° I'm not sure, but probably it is, if so what could I do? for the 2° the database is used for an ERP software, so when i launch the software and go to some menus, and error pop-up appears with this sql error inside. For the 3°, i think you're suggesting I export as WIN1252, right? If so, i just tried, and when i try to import it in my UTF-8 database it gives me te error that "character 0x81 with encoding "WIN1252" has no equivalent in "UTF8"" — André Lucas
– André Lucas, Commented Mar 31, 2021 at 19:38
See this SO post Encoding. One of the answers mentions ODBC driver. Are you using ODBC and if so what driver? — Adrian Klaver
– Adrian Klaver, Commented Mar 31, 2021 at 21:23
Hello. I'm using ODBC and it is ANSI. Should I try with ISO again and OBDC UNICODE? — André Lucas
– André Lucas, Commented Apr 1, 2021 at 19:44
I'm not sure what ISO is referring to? Given that the database is now using the UTF8 encoding I would say the ODBC UNICODE driver would be the one to use. — Adrian Klaver
– Adrian Klaver, Commented Apr 1, 2021 at 21:45

Laurenz Albe · Accepted Answer · 2021-04-01 18:43:50Z

3

You made a mistake in converting the database.

It must have been encoded in WINDOWS-1252, not ISO 8859-1, and there must have been an “em dash” (Unicode U+2013, code point 96 in WINDOWS-1252).

When you dumped the database with encoding LATIN1 = ISO88591 and loaded it, the byte 0x96 was interpreted as Unicode U+0096, which is 0xC296 in UTF-8. This character does not exist in WINDOWS-1252, so the conversion fails.

You have to dump and restore the database again, but this time use

pg_dump --encoding=WIN1252

Since you also have code point 0x81 in your database, it could aso be WIN1251 (Cyrillic) or WIN1256 (Arabic). Or you have some wild mix – then you must fix the data before migration.

edited Apr 1, 2021 at 18:43

answered Apr 1, 2021 at 3:15

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Adrian Klaver Over a year ago

See comment from @AndréLucas above where the above was attempted with a different error. My suspicion is there are multiple encodings in the original data.

Laurenz Albe Over a year ago

I have commented on that in my answer.

André Lucas Over a year ago

I use a portuguese language on my software, there is a lot of accents and ççç, is that maybe why?

Laurenz Albe Over a year ago

No, it has to do with the encoding that the database clients used.

Collectives™ on Stack Overflow

How to convert old postgresql db from ascii to utf8

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related