Python: replace nonbreaking space in Unicode

Question

In Python, I have a text that is Unicode-encoded. This text contains non-breaking spaces, which I want to convert to 'x'. Non-breaking spaces are equal to chr(160). I have the following code, which works great when I run it as Django via Eclipse using Localhost. No errors and any non-breaking spaces are converted.

my_text = u"hello"
my_new_text = my_text.replace(chr(160), "x")

However when I run it any other way (Python command line, Django via runserver instead of Eclipse) I get an error:

'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

I guess this error makes sense because it's trying to compare Unicode (my_text) to something that isn't Unicode. My questions are:

If chr(160) isn't Unicode, what is it?
How come this works when I run it from Eclipse? Understanding this would help me determine if I need to change other parts of my code. I have been testing my code from Eclipse.
(most important) How do I solve my original problem of removing the non-breaking spaces? my_text is definitely going to be Unicode.

Fred Foo · Accepted Answer · 2012-07-11 16:23:17Z

11

In Python 2, chr(160) is a byte string of length one whose only byte has value 160, or hex a0. There's no meaning attached to it except in the context of a specific encoding.
I'm not familiar with Eclipse, but it may be playing encoding tricks of its own.
If you want the Unicode character NO-BREAK SPACE, i.e. code point 160, that's unichr(160).

E.g.,

>>> u"hello\u00a0world".replace(unichr(160), "X")
u'helloXworld

edited Jul 11, 2012 at 16:23

answered Jul 11, 2012 at 16:17

Fred Foo

365k80 gold badges765 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user984003 Over a year ago

Perfect, thanks. unichr() works both via Eclipse and not via Eclipse. Weird that chr() and unichr() give the same result when running from Eclipse.

Mark Tolonen Over a year ago

Your Eclipse configuration may change the default encoding to UTF8 instead of ASCII. That's not recommended, for what should now be obvious compatibility reasons. Code written in that configuration may not work elsewhere.

dda Over a year ago

Actually ASCII (0x00 to 0x7F) is compatible with UTF-8, since the first 128 codepoints of UTF-8 are the same as ASCII. However, 0xa0 is definitely not ASCII, hence the error while using chr instead of unichr...

Collectives™ on Stack Overflow

Python: replace nonbreaking space in Unicode

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related