UnicodeDecodeError on byte type

Question

Using Python 3.4 I'm getting the following error when trying to decode a byte type using utf-32

Traceback (most recent call last):
  File "c:.\SharqBot.py", line 1130, in <module>
    fullR=s.recv(1024).decode('utf-32').split('\r\n')
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

and the following when trying to decode it into utf-16

  File "c:.\SharqBot.py", line 1128, in <module>
    fullR=s.recv(1024).decode('utf-16').split('\r\n')
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x0a in position 374: truncated data

When I decode using utf-8 there is no error. s is a socket connected to the twitch IRC server irc.chat.twitch.tv on port 80.

It receives the following:

b':tmi.twitch.tv 001 absolutelyabot :Welcome, GLHF!\r\n:tmi.twitch.tv 002 absolutelyabot :Your host is tmi.twitch.tv\r\n:tmi.twitch.tv 003 absolutelyabot :This server is rather new\r\n:tmi.twitch.tv 004 absolutelyabot :-\r\n:tmi.twitch.tv 375 absolutelyabot :-\r\n:tmi.twitch.tv 372 absolutelyabot :You are in a maze of twisty passages, all alike.\r\n:tmi.twitch.tv 376 absolutelyabot :>\r\n'

Am I doing something wrong when trying to decode to utf 16 and 32? The reason I want to use utf-32 is because occasionally someone sends a character that is not in utf-8 and I want to be able to recieve that instead of it throwing an error because utf-8 does not support that character. Thanks for any help.

I'm not trying to avoid the error all together, I'm trying to recieve the characters that aren't supported in utf-8. — Shariq Ali
– Shariq Ali, Commented Mar 21, 2016 at 19:48
So you can try to decode the whole line using UTF-8. If an exception is thrown, only then try an alternative charset. I doubt IRC protocl would allow UTF-16, 32 ever, because of embedded NULs — Antti Haapala
– Antti Haapala, Commented Mar 21, 2016 at 19:56
"When I decode using utf-8 there is no error". So why do you think UTF-16 or UTF-32 should work?? — Mark Tolonen
– Mark Tolonen, Commented Oct 21, 2019 at 16:56

RATAN KUMAR · Accepted Answer · 2018-04-04 09:58:51Z

21

try using encoding = 'ISO-8859-1'

answered Apr 4, 2018 at 9:58

RATAN KUMAR

6237 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ShadowRanger Over a year ago

@CodeWarrior: Presumably the original text is latin-1 (the friendly name for ISO-8859-1) encoded, not utf-8. Or it isn't, but latin-1 is a one-to-one encoding where every byte maps to a character, so it's just masking errors and producing gibberish. Either way.

ShadowRanger · Accepted Answer · 2016-03-21 19:57:07Z

3

Every Unicode ordinal can be represented in UTF-8, if decodeing as UTF-8 isn't working, that's because the bytes being transmitted are in a different encoding, or the data is mixed text and binary data, and only some of it is UTF-8. Odds are the text is UTF-8 encoded (most network protocols are), so non-UTF-8 data would be framing data or the like, and would need to be parsed to extract the text data.

Any attempt to mask such an error in the text/binary case would just be silencing problems, not fixing them. You need to know the encoding of the data (and the format, if it's not all text data with a single encoding), and use that. The data you receive doesn't magically become UTF-16 or UTF-32 because you want it to.

answered Mar 21, 2016 at 19:57

ShadowRanger

158k12 gold badges221 silver badges314 bronze badges

1 Comment

Antti Haapala Over a year ago

IRC does not specify text encoding.

Anh Lan · Accepted Answer · 2019-10-21 16:44:43Z

0

you can try with decode/encode('utf-16-le'). I tried it and it was OK to me. But I am not realy clear why. :P

answered Oct 21, 2019 at 16:44

Anh Lan

11 silver badge1 bronze badge

1 Comment

Nick Martin Over a year ago

Please try to be more clear with your answer and explain why this worked for you. Perhaps describe what is different between your approach and the OP

Collectives™ on Stack Overflow

UnicodeDecodeError on byte type

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related