How to convert a unicode string to a literal string in Python?

Question

Here are a few examples (unicode) string:

a = u'\u03c3\u03c4\u03b7\u03bd \u03a0\u03bb\u03b1\u03c4\u03b5\u03af\u03b1 \u03c4\u03bf\u03c5'
b = u'\u010deprav so mu doma\u010di in strici duhovniki odtegovali denarno pomo\u010d . Kljub temu mu je uspelo'
c = u'sovi\xe9ticas excepto Georgia , inclusive las 3 rep\xfablicas que hab\xedan'

My end goal is to split on the backslashes (and spaces), so that it looks like this:

split_a = [u03c3, u03c4, u03b7, u03bd, ,u03a0, u03bb, u03b1, u03c4, u03b5, u03af, u03b1, ,u03c4, u03bf, u03c5]
split_b = ['', 'u010deprav', 'so', 'mu', 'doma', 'u010di', 'in', 'strici',  'duhovniki' odtegovali denarno pomo', 'u010d', '.', 'Kljub', 'temu', 'mu', 'je', 'uspelo']
split_c = ['sovi', 'xe9ticas', 'excepto', 'Georgia', ',', 'inclusive', 'las', '3',  'rep', 'xfablicas', 'que', 'hab', 'xedan']

(The empty places where there is both a space and a backslash are totally fine).

When I try to split using this:

a.split("\\"), it doesn't change the string at all.

I saw this example here, which makes me think that I need to make my strings literal strings (using r). However, I don't know how to convert my large list of strings into all literal strings.

When I searched on that, I got here. However, my compiler throws an error when I run a.encode('latin-1').decode('utf-8'). The error it throws is 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)

So, my question is: How can I take a list of unicode strings, programmatically iterate through them and make them string literals, and then split on a backslash?

Python is an interpreted language, so the Python interpreter throws the error. — linusg
– linusg, Commented May 10, 2016 at 16:01
I think you're a bit above my level here, but thanks for the info! — python_in_trouble
– python_in_trouble, Commented May 10, 2016 at 16:05

Mark Ransom · Accepted Answer · 2016-05-10 16:03:26Z

3

You have a Unicode string, which already has one Unicode codepoint per string element. The '\\' is just the representation of the string that is printed to the console, it's not the actual contents.

To make a list of numbers out of it is actually quite easy:

split_a = [ord(c) for c in a]

If you need to make a bunch of strings consisting of the letter u followed by the hex value, that's only slightly more complicated:

split_a = ', '.join('u' + ('%04x' % ord(c)) for c in a)

answered May 10, 2016 at 16:03

Mark Ransom

310k44 gold badges423 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

python_in_trouble Over a year ago

The second one solved my problem for my example above. I've edited my question to include some more sample unicode strings, let me know if you have a solution for those other types of strings.

Christian Over a year ago

Was just about to push submit on a similar solution, so I'll just add a follow up comment - you'd have to do a bit more work to only display the values for characters that are unknown encodings. Specifically, in the OP's example, rendering the space character as " ", vs. "u0020".

Mark Ransom Over a year ago

@python_in_trouble wow, that's a completely different problem now, much more complex.

Daniel Roseman · Accepted Answer · 2016-05-10 16:08:52Z

1

You can use the unicode_escape code to translate a unicode string to its escaped representation.

split_a = a.encode('unicode_escape').split('\\')

outputs:

['',
 'u03c3',
 'u03c4',
 'u03b7',
 'u03bd ',
 'u03a0',
 'u03bb',
 'u03b1',
 'u03c4',
 'u03b5',
 'u03af',
 'u03b1 ',
 'u03c4',
 'u03bf',
 'u03c5']

answered May 10, 2016 at 16:08

Daniel Roseman

602k68 gold badges910 silver badges923 bronze badges

1 Comment

python_in_trouble Over a year ago

This worked for me if I then iterated through the split_a list and further split on " " (space).

Collectives™ on Stack Overflow

How to convert a unicode string to a literal string in Python?

2 Answers 2

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related