0

I have this old database from postgresql version 8.4 that my boss told me to try to convert from SQL_ASCII encoding to UTF8 encoding. I tried to pg_dump --encoding=UTF8 and I got the message "invalid byte sequence for encoding “UTF8” : 0xe76122" And searching more i tried pg_dump --encoding=ISO88591, and it worked, and I could also import it with no problems to the new empty UTF8 database that i created, but from time to time, i'm getting this message: "ERROR: character 0xc296 with encoding "UTF8" has no equivalent in "WIN1252";". Any solutions?

6
  • First an encoding of SQL_ASCII essentially meant no encoding was enforced, so it is possible to have content in different encodings in a database. Second, what are you doing when you get the error message? Third, what happens if you import with --encoding=WIN1252? Commented Mar 31, 2021 at 19:24
  • So, for the 1° I'm not sure, but probably it is, if so what could I do? for the 2° the database is used for an ERP software, so when i launch the software and go to some menus, and error pop-up appears with this sql error inside. For the 3°, i think you're suggesting I export as WIN1252, right? If so, i just tried, and when i try to import it in my UTF-8 database it gives me te error that "character 0x81 with encoding "WIN1252" has no equivalent in "UTF8"" Commented Mar 31, 2021 at 19:38
  • See this SO post Encoding. One of the answers mentions ODBC driver. Are you using ODBC and if so what driver? Commented Mar 31, 2021 at 21:23
  • Hello. I'm using ODBC and it is ANSI. Should I try with ISO again and OBDC UNICODE? Commented Apr 1, 2021 at 19:44
  • I'm not sure what ISO is referring to? Given that the database is now using the UTF8 encoding I would say the ODBC UNICODE driver would be the one to use. Commented Apr 1, 2021 at 21:45

1 Answer 1

3

You made a mistake in converting the database.

It must have been encoded in WINDOWS-1252, not ISO 8859-1, and there must have been an “em dash” (Unicode U+2013, code point 96 in WINDOWS-1252).

When you dumped the database with encoding LATIN1 = ISO88591 and loaded it, the byte 0x96 was interpreted as Unicode U+0096, which is 0xC296 in UTF-8. This character does not exist in WINDOWS-1252, so the conversion fails.

You have to dump and restore the database again, but this time use

pg_dump --encoding=WIN1252

Since you also have code point 0x81 in your database, it could aso be WIN1251 (Cyrillic) or WIN1256 (Arabic). Or you have some wild mix – then you must fix the data before migration.

Sign up to request clarification or add additional context in comments.

4 Comments

See comment from @AndréLucas above where the above was attempted with a different error. My suspicion is there are multiple encodings in the original data.
I have commented on that in my answer.
I use a portuguese language on my software, there is a lot of accents and ççç, is that maybe why?
No, it has to do with the encoding that the database clients used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.