Postgres invalid byte sequence for encoding "UTF8": 0xc3 0x2f

Question

I work with a payment API and it returns some XML. For logging I want to save the API response in my database.

One word in the API is "manhã" but the API returns "manh�". Other chars like á ou ç are being returned correctly, this is some bug in the API I guess.

But when trying to save this in my DB I get:

Postgres invalid byte sequence for encoding "UTF8": 0xc3 0x2f

How can I solve this?

I tried things like

response.encode("UTF-8") and also force_encode but all I get is:

Encoding::UndefinedConversionError ("\xC3" from ASCII-8BIT to UTF-8)

I need to either remove this wrong character or convert it somehow.

Are you sure that "a payment API" is giving you UTF-8 at all? — AmigoJack
– AmigoJack, Commented Oct 19, 2020 at 18:09
@AmigoJack the api returns a XML in ISO-8859-1. My rails table field is a normal "character varying". I have other APIs that return UTF-8 and I need to store them all in the same column. So I need to convert the API response in some way to be able to save it in the DB. — almo
– almo, Commented Oct 19, 2020 at 18:15
The XML starts like this: "<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>" — almo
– almo, Commented Oct 19, 2020 at 18:54
It should be obvious: ISO-8859-1 is a different encoding than UTF-8 - you have to convert one to the other instead of passing it thru unhandled. — AmigoJack
– AmigoJack, Commented Oct 20, 2020 at 0:20

Mark G. · Accepted Answer · 2020-10-20 14:10:13Z

1

You’re on the right track - you should be able to solve the problem with the encode method - when the source encoding is known you should be able to simply use:

response.encode(‘UTF-8’, ‘ISO-8859-1’)

There may be times where there are invalid characters in the source encoding, and to get around exceptions, you can instruct ruby how to handle them:

# This will transcode the string to UTF-8 and replace any invalid/undefined characters with ‘’ (empty string)
response.encode(‘UTF-8’, 'ISO-8859-1', invalid: :replace, undef: :replace, replace: ‘’)

This is all laid out in the Ruby docs for String - check them out!

—--

Note, many people incorrectly assume that force_encode will somehow fix encoding problems. force_encode simply tags the string as the specified encoding - it does not transcode and replace/remove the invalid characters. When you're converting between encodings, you must transcode so that characters in one character set are correctly represented in the other character set.

As pointed out in the comment section, you can use force_encoding to transcode your string if you used: response.force_encoding('ISO-8859-1').encode('UTF-8') (which is equivalent to the first example using encode above).

edited Oct 20, 2020 at 14:10

answered Oct 20, 2020 at 2:47

Mark G.

3,2801 gold badge31 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

AmigoJack Over a year ago

The source encoding is known and it has no invalid sequences - there should neither be the need to "drop", nor to force something. Just convert it and let exceptions occur, as none are expected.

mu is too short Over a year ago

force_encoding will help: response.force_encoding('ISO-8859-1').encode('UTF-8') for example.

Mark G. Over a year ago

@muistooshort - yes, you can use force_encoding in that way, my point was more that people use force_encoding thinking that it will somehow transcode the string when all it does it change what encoding the string is tagged as. I'll update the post for clarity around this point. Thanks! @AmigoJack - Good call - when both source/dest encodings are known, you shouldn't have to specify the replacement options. Of course, assuming the encoding is valid in the source encoding. I'll update the answer with a bit more context.

Collectives™ on Stack Overflow

Postgres invalid byte sequence for encoding "UTF8": 0xc3 0x2f

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related