Getting ascii code error even after encoding with utf-8

Question

I am posting to github's api for markdown, and in the post request I am sending json data. I discovered that I can't write lists because the characters are not a part of ascii and looked it up to find that I should always encode. I encoded the text which needed to be marked down and the api is working, but I still get the same error when I try to make lists.

The code for the POST method is:

def markDown(to_mark):
    headers = {
        'content-type': 'application/json'
    }
    text = to_mark.decode('utf8')
    payload = {
        'text': text,
        'mode':'gfm'
    }
    data = json.dumps(payload)
    req = urllib2.Request('https://api.github.com/markdown', data, headers)
    response = urllib2.urlopen(req)
    marked_down = response.read()
    return marked_down

And the error that I get when I try making lists is as follows:

'ascii' codec can't decode byte 0xe2 in position 55: ordinal not in range(128)

Add the full traceback:

Traceback (most recent call last):
    File "/home/bigb/Programming/google_appengine/google/appengine/runtime/wsgi.py", line 266, in Handle
      result = handler(dict(self._environ), self._StartResponse)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1519, in __call__
      response = self._internal_error(e)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1511, in __call__
      rv = self.handle_exception(request, response, e)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1505, in __call__
      rv = self.router.dispatch(request, response)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
      return route.handler_adapter(request, response)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1077, in __call__
      return handler.dispatch()
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 547, in dispatch
      return self.handle_exception(e, self.app.debug)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 545, in dispatch
      return method(*args, **kwargs)
    File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 232, in post
      mark_blog = markDown(blog)
    File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 43, in markDown
      text = to_mark.decode('utf8')
    File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
      return codecs.utf_8_decode(input, errors, True)
  UnicodeEncodeError: 'ascii' codec can't encode characters in position 45-46: ordinal not in range(128)

Am I understanding something wrong here ? Thanks!

Please include the full traceback. Note that for json.dumps() you do not need to encode, the library handles encoding for you. That is not the cause your problem. — Martijn Pieters
– Martijn Pieters, Commented Dec 10, 2013 at 10:50
to_mark is the value that comes from a textarea to submit a new post. I am using jinja2 and autoescape whatever content that is submitted. — Bhargav
– Bhargav, Commented Dec 10, 2013 at 11:21
That doesn't mean that to_mark is a unicode value. It'll be a byte string, encoded by the browser to match your form content type (usually UTF8). — Martijn Pieters
– Martijn Pieters, Commented Dec 10, 2013 at 11:23
Your traceback has nothing to do with the code you posted. You are storing a bytestring value a Unicode field, either subject or mark_blog or both. — Martijn Pieters
– Martijn Pieters, Commented Dec 10, 2013 at 11:28

Martijn Pieters · Accepted Answer · 2013-12-10 10:57:34Z

1

Your to_mark value is not a Unicode value; you already have encoded byte string there. Trying to encode a byte string tells Python that it should first decode the value to Unicode before encoding again. This causes your exception:

>>> '\xc3\xa5'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

For the json.dumps() function, you want to use Unicode values. If to_mark contains UTF-8 data, use str.decode():

text = to_mark.decode('utf8')

answered Dec 10, 2013 at 10:57

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bhargav Over a year ago

So if I am understanding this properly, byte string is already encoded ? And undicode is a decoded version of byte string ? And I just tried it out, I am still getting the same error

Martijn Pieters Over a year ago

Yes, you are getting there. Python strings are encoded data; unicode values are decoded. A bit like strings containing digits and integers, or strings representing a date and time versus datetime objects. See the Python Unicode HOWTO, and I'd really read Joel on Unicode and Pragmatic Unicode as well.

bruno desthuilliers · Accepted Answer · 2013-12-10 11:44:59Z

1

Your code snippets reads:

text = to_mark.encode('utf-8')

but in the traceback you have:

File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 43, in markDown
    text = to_mark.decode('utf8')

Please first make sure you post the real code and traceback (that is: you post the code that actually raise the exception).

answered Dec 10, 2013 at 11:44

bruno desthuilliers

78.3k6 gold badges102 silver badges129 bronze badges

1 Comment

Bhargav Over a year ago

Sorry, I did not edit it when I changed it to decode according to Martjin's suggestions. Edited now.

Sabuj Hassan · Accepted Answer · 2013-12-10 10:50:52Z

0

I can not remember accurately, but probably using decode/encode at response.read() worked for me when I have faced the exact same error.

response.read().decode("utf8")

answered Dec 10, 2013 at 10:50

Sabuj Hassan

39.7k14 gold badges83 silver badges88 bronze badges

1 Comment

Martijn Pieters Over a year ago

The exception applies to a Unicode values being coerced somewhere; why would decoding from bytes to Unicode in another place solve that issue? Even if this is the issue, why would UTF8 work here? That is a big assumption to make.

Collectives™ on Stack Overflow

Getting ascii code error even after encoding with utf-8

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related