Unicode string to Unicode character, Python 3

Question

I'm programming using Python 3.x. Say I have the following Unicode string:

my_string =' \xed\x95\x9c'

'\xed\x95\x9c' is actually the UTF-8 byte stream for the Korean character 한. What's the easiest way to convert my_string to 한? my_string.decode('utf-8') doesn't work because my_string is a Unicode string, not a byte string.

unutbu · Accepted Answer · 2017-06-16 23:33:52Z

3

There are many possible encode/decode chains which lead to the desired result. Here is one:

In [257]: '\xed\x95\x9c'.encode('latin-1').decode('utf-8')
Out[257]: '한'

Here is the code I used to find this encode/decode chain.

answered Jun 16, 2017 at 23:33

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Unicode string to Unicode character, Python 3

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related