0

I have a string of unicode ordinals (in hex form) like so:

\u063a\u064a\u0646\u064a\u0627

It's the unicode repsentation of the Arabic string غينيا (gotten of an Arabic lorem ipsum generator).

I want to convert the unicode hex string to غينيا. I tried print u'%s' % "\u063a\u064a\u0646\u064a\u0627" (pointed out here) but that simply returns the hex format, not the symbols. print word.replace("\u","\\u") doesn't do the job either. What to do?

7
  • 1
    Is \u063a\u064a\u0646\u064a\u0627 an ascii string, where the backslashes are actually escaped? Commented Aug 21, 2017 at 14:40
  • Where are you outputting the string to? If it is a console, then the console may not have full unicode support. Commented Aug 21, 2017 at 14:41
  • @IzaakvanDongen: actually not escaped. Should I run a quick s.replace("\u", "\\u") on the hex string before trying to print it? Commented Aug 21, 2017 at 14:43
  • Still not clear what you actually have. Do you have that value in a variable? What does len say the length is? Better provide actual Python code we can play with. Commented Aug 21, 2017 at 14:44
  • @JakeConkerton-Darby: good question. Yes, indeed to the console - but my ultimate aim is to draw this text on top of an image using PIL (see here: stackoverflow.com/questions/45675525/…) Commented Aug 21, 2017 at 14:44

1 Answer 1

1

I'm not entirely sure from the question what you want, so I'll cover both cases I can see.

Case 1: You just want to output the arabic string from your code, using the unicode literal syntax. In this case, you should prefix your string literal with a u and you'll be right as rain:

s = u"\u063a\u064a\u0646\u064a\u0627"
print(s)

This would probably do the same as

print u'%s' % s

except shorter. In this case, formatting an otherwise empty string into your formed string doesn't make any sense, because it's not changing anything - in other words, u'%s' % s == s.

Case 2: You have an escaped string from some other source that you want to evaluate as a Unicode string. This is kind of what it looks like you're trying to do with print u'%s' %. This can be done with

import ast
s = r"\u063a\u064a\u0646\u064a\u0627"
print ast.literal_eval("u'{}'".format(s))

Note that unlike eval this is safe, as literal_eval doesn't allow anything like a function call. Also see that s here is an r-prefixed string, so the backslashes aren't escaping anything but are literally backslash characters.

Both pieces of code correctly output

غينيا

Some elaboration on print u'%s' % s for case 1. This behaves differently, because if the string has already been escaped, it won't be evaluated like a Unicode literal in the formatting. This is because Python only actually builds Unicode out of unicode literal-like expressions (such as s) when they are at first evaluated. If it has been escaped, this is kind of out of reach by using normal string operations, so you have to use literal_eval to evaluate it again in order to properly print the string. When you run

print u'%s' % s

the output is

\u063a\u064a\u0646\u064a\u0627

Note that this isn't a representation of a Unicode object but literally an ascii string with some backslashes and characters.

Sign up to request clarification or add additional context in comments.

2 Comments

Isn't Case 1 the same as print u'%s' % s (where s = '\u063a\u064a\u0646\u064a\u0627')
Oh wait I may have misread.. I'll elaborate some more : )

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.