6

In Python, I have a text that is Unicode-encoded. This text contains non-breaking spaces, which I want to convert to 'x'. Non-breaking spaces are equal to chr(160). I have the following code, which works great when I run it as Django via Eclipse using Localhost. No errors and any non-breaking spaces are converted.

my_text = u"hello"
my_new_text = my_text.replace(chr(160), "x")

However when I run it any other way (Python command line, Django via runserver instead of Eclipse) I get an error:

'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

I guess this error makes sense because it's trying to compare Unicode (my_text) to something that isn't Unicode. My questions are:

  1. If chr(160) isn't Unicode, what is it?
  2. How come this works when I run it from Eclipse? Understanding this would help me determine if I need to change other parts of my code. I have been testing my code from Eclipse.
  3. (most important) How do I solve my original problem of removing the non-breaking spaces? my_text is definitely going to be Unicode.

1 Answer 1

11
  1. In Python 2, chr(160) is a byte string of length one whose only byte has value 160, or hex a0. There's no meaning attached to it except in the context of a specific encoding.
  2. I'm not familiar with Eclipse, but it may be playing encoding tricks of its own.
  3. If you want the Unicode character NO-BREAK SPACE, i.e. code point 160, that's unichr(160).

E.g.,

>>> u"hello\u00a0world".replace(unichr(160), "X")
u'helloXworld
Sign up to request clarification or add additional context in comments.

3 Comments

Perfect, thanks. unichr() works both via Eclipse and not via Eclipse. Weird that chr() and unichr() give the same result when running from Eclipse.
Your Eclipse configuration may change the default encoding to UTF8 instead of ASCII. That's not recommended, for what should now be obvious compatibility reasons. Code written in that configuration may not work elsewhere.
Actually ASCII (0x00 to 0x7F) is compatible with UTF-8, since the first 128 codepoints of UTF-8 are the same as ASCII. However, 0xa0 is definitely not ASCII, hence the error while using chr instead of unichr...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.