2
  • I have got 1 source file with utf-8 characters (names)
  • I have got 1 out file with same character encoding.
  • I am working with a html page, paste and cut the useful information for me to out file.
  • I use "éáűúőóüöäđĐ' characters in my "friendsNames" txt file.

And I gave this error:

Traceback (most recent call last):
  File "C:\Users\Rendszergazda\workspace\achievements\hiba.py", line 9, in <module>
    s = str(urlopen("http://eu.battle.net/wow/en/character/arathor/"+str(names[0])+"/achievement").read(), encoding='utf-8')
  File "C:\Python27\lib\encodings\cp1250.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <undefined>

What do you think? What is my problem?

from urllib import urlopen
import codecs

result = codecs.open("C:\Users\Desktop\Achievements\Result.txt", "a", "utf-8")
fh = codecs.open("C:\Users\Desktop\Achievements\FriendsNames.txt", "r", "utf-8")
line = fh.readline()
names = line.split(" ")
fh.close()

s = urlopen("http://eu.battle.net/wow/en/character/arathor/"+str(names[0])+"/achievement").read(), encoding='utf8')
result.write(str(s))
result.close()
2
  • 1
    Just for information: The character 0xfeff is a BOM. Additionally your error message and your code sample do not match. Commented Mar 26, 2012 at 11:45
  • 1
    If you want to learn more about unicode, I strongly recommend bit.ly/unipain Commented Mar 26, 2012 at 11:46

1 Answer 1

2

The problem you're having is that you're calling str(array[0]), where array[0] is a unicode string. This means it'll be encoded in the default encoding, which for some reason in your case seems to be cp1250. (Did you mess with sys.setdefaultencoding()? Don't do that.)

To get bytestrings out of unicode, you should explicitly encode the unicode. Don't just call str() on it. Encode it using the encoding the result should have (which in the case of URLs is somewhat difficult to guess at, but in this case is probably UTF-8.) So, use `array[0].encode('utf-8')'. You may also need to quote the non-ASCII characters in your URL, although that depends on what the remote end expects.

Sign up to request clarification or add additional context in comments.

3 Comments

But I gave a new problem, in utf-8 xy.write "eat" my "\n" , I tried this: u"\u000A" (utf-8 new line), but it does not work :(
I'm afraid I don't understand the problem. u"\u00A" is the same thing as u"\n", and it's unicode, not UTF-8. (See bit.ly/unipain .) I suggest you post a new question describing your current problem.
You're probably on Windows and trying to open your output with Notepad, or something. Notepad only understands \r\n, but Word and Wordpad will display your file just fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.