8

In Python, if I have a string like:

a =" Hello - to - everybody"

And I do

a.split('-')

then I get

[u'Hello', u'to', u'everybody']

This is just an example.

How can I get a simple list without that annoying u'??

4
  • 16
    First understand what the u'' is Commented Feb 2, 2013 at 17:01
  • here shows how to convert: stackoverflow.com/questions/1207457/… Commented Feb 2, 2013 at 17:05
  • 2
    Is this your real code? You split a string, and the delimiter is also a string, then the result should be a list of strings, not a list of unicodes. Commented Feb 2, 2013 at 17:19
  • 1
    @nymk I imagine that the asker is using Django, which tends to make everything Unicode wherever possible due to it's strong support for different character sets, and they have incorrectly simplified the question down. Commented Feb 2, 2013 at 18:39

3 Answers 3

21

The u means that it's a unicode string - your original string must also have been a unicode string. Generally it's a good idea to keep strings Unicode as trying to convert to normal strings could potentially fail due to characters with no equivalent.

The u is purely used to let you know it's a unicode string in the representation - it will not affect the string itself.

In general, unicode strings work exactly as normal strings, so there should be no issue with leaving them as unicode strings.

In Python 3.x, unicode strings are the default, and don't have the u prepended (instead, bytes (the equivalent to old strings) are prepended with b).

If you really, really need to convert to a normal string (rarely the case, but potentially an issue if you are using an extension library that doesn't support unicode strings, for example), take a look at unicode.encode() and unicode.decode(). You can either do this before the split, or after the split using a list comprehension.

Sign up to request clarification or add additional context in comments.

Comments

1

I have a opposite problem. The str '第一回\u3000甄士隐梦幻识通灵 贾雨村风尘怀闺秀' needs to be splitted by the unicode character. But I made wrong and code split('\u') that leaded to the unicode syntax error.

I should code split('\u3000')

Comments

0

You can try the following to remove '\u3000' :

idx = your_string.find(u'\u3000')
new_string = your_string[:idx] + your_string[idx + 1:]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.