How to escape UNICODE string in python (to javascript escape)

Question

I have the following string "◣⛭◣◃✺▲♢" and I want to make that string into "\u25E3\u26ED\u25E3\u25C3\u273A\u25B2\u2662". Exactly the same as this site does https://mothereff.in/js-escapes

I was wondering if this is possible in python. I have tried allot of stuff from the unicode docs for python but failed miserably.

Example of what I tried before:

#!/usr/bin/env python
# -*- coding: latin-1 -*-

f = open('js.js', 'r').read()

print(ord(f[:1]))

help would be appreciated!

try u"◣⛭◣◃✺▲♢".encode('unicode-escape')

georg
– georg

2016-02-13 18:28:13 +00:00
Commented Feb 13, 2016 at 18:28 — georg
– georg, Commented Feb 13, 2016 at 18:28

Nikita · Accepted Answer · 2016-02-13 18:28:33Z

4

Considering you're using Python 3:

unicode_string="◣⛭◣◃✺▲♢"
byte_string= unicode_string.encode('ascii', 'backslashreplace')
print(byte_string)

See codecs module documentation for more infotmation.

However, to work with JavaScript notation, there's a special module json, and then you could achieve the same thing:

import json
unicode_string="◣⛭◣◃✺▲♢"
json_string=json.dumps(unicode_string)
print(json_string)

answered Feb 13, 2016 at 18:28

Nikita

6,3812 gold badges32 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

bobince Over a year ago

+1 for json.dumps: use the right escaper for the job. Python unicode-escape is not the same syntax as JSON/JavaScript (it'll fail for characters outside the Basic Multilingual Plane: Python will say \U00001F4A9 where JS wants \uD83D\uDCA9)

Seth · Accepted Answer · 2016-02-13 18:32:59Z

0

If you're in python 2, then I'd suspect you're getting something like this:

>>> s = "◣⛭◣◃✺▲♢"
>>> s[0]
'\xe2'

To get to the unicode code points in a UTF-8 encoded file (or buffer), you'll need to decode it into a python unicode object first (otherwise you'll see the bytes that make up the UTF-8 encoding).

>>> s_utf8 = s.decode('utf-8')
>>> s_utf8[0]
u'\u25e3'
>>> ord(s_utf8[0])
9699
>>> hex(ord(s_utf8[0]))
'0x25e3'

In your case, you can go straight from the ord() to a literal unicode escape with something like this:

>>> "\\u\x" % (ord(s_utf8[0]))
'\\u25e3'

Or convert the entire string in one go with a list comprehension:

>>> ''.join(["\\u%04x" % (ord(c)) for c in s_utf8])
'\\u25e3\\u26ed\\u25e3\\u25c3\\u273a\\u25b2\\u2662'

Of course, when you're doing the conversion this way, you're going to display the code points for all the characters in the string. You'll have to decide which code points to show, or the ABCs will be escaped too:

>>> ''.join(["\\u%04x" % (ord(c)) for c in u"ABCD"])
'\\u0041\\u0042\\u0043\\u0044'

Or, just use georg's suggestion to let python figure all that out for you.

answered Feb 13, 2016 at 18:32

Seth

46.8k10 gold badges87 silver badges123 bronze badges

3 Comments

bobince Over a year ago

This will fail for characters outside the Basic Multilingual Plane (on wide builds, including all Python 3.3+): ord(c) can take more than four hex digits.

Seth Over a year ago

If the target here is JavaScript, it probably doesn't matter. JS's "\u" escapes would require surrogate pairs outside the BMP, and this method won't make them. At that point you should be using json.dumps, i.e.: json.dumps("𐌀𐌁𐌂") -> "\ud800\udf00\ud800\udf01\ud800\udf02"

Seth Over a year ago

I.e., what you said on your comment to @nikita's answer. :)

Collectives™ on Stack Overflow

How to escape UNICODE string in python (to javascript escape)

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related