1

I am posting to github's api for markdown, and in the post request I am sending json data. I discovered that I can't write lists because the characters are not a part of ascii and looked it up to find that I should always encode. I encoded the text which needed to be marked down and the api is working, but I still get the same error when I try to make lists.

The code for the POST method is:

def markDown(to_mark):
    headers = {
        'content-type': 'application/json'
    }
    text = to_mark.decode('utf8')
    payload = {
        'text': text,
        'mode':'gfm'
    }
    data = json.dumps(payload)
    req = urllib2.Request('https://api.github.com/markdown', data, headers)
    response = urllib2.urlopen(req)
    marked_down = response.read()
    return marked_down

And the error that I get when I try making lists is as follows:

'ascii' codec can't decode byte 0xe2 in position 55: ordinal not in range(128)

Add the full traceback:

Traceback (most recent call last):
    File "/home/bigb/Programming/google_appengine/google/appengine/runtime/wsgi.py", line 266, in Handle
      result = handler(dict(self._environ), self._StartResponse)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1519, in __call__
      response = self._internal_error(e)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1511, in __call__
      rv = self.handle_exception(request, response, e)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1505, in __call__
      rv = self.router.dispatch(request, response)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
      return route.handler_adapter(request, response)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1077, in __call__
      return handler.dispatch()
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 547, in dispatch
      return self.handle_exception(e, self.app.debug)
    File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 545, in dispatch
      return method(*args, **kwargs)
    File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 232, in post
      mark_blog = markDown(blog)
    File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 43, in markDown
      text = to_mark.decode('utf8')
    File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
      return codecs.utf_8_decode(input, errors, True)
  UnicodeEncodeError: 'ascii' codec can't encode characters in position 45-46: ordinal not in range(128)

Am I understanding something wrong here ? Thanks!

5
  • 1
    Please include the full traceback. Note that for json.dumps() you do not need to encode, the library handles encoding for you. That is not the cause your problem. Commented Dec 10, 2013 at 10:50
  • 1
    Are you certain that to_mark is a Unicode value? Commented Dec 10, 2013 at 10:54
  • to_mark is the value that comes from a textarea to submit a new post. I am using jinja2 and autoescape whatever content that is submitted. Commented Dec 10, 2013 at 11:21
  • 1
    That doesn't mean that to_mark is a unicode value. It'll be a byte string, encoded by the browser to match your form content type (usually UTF8). Commented Dec 10, 2013 at 11:23
  • Your traceback has nothing to do with the code you posted. You are storing a bytestring value a Unicode field, either subject or mark_blog or both. Commented Dec 10, 2013 at 11:28

3 Answers 3

1

Your to_mark value is not a Unicode value; you already have encoded byte string there. Trying to encode a byte string tells Python that it should first decode the value to Unicode before encoding again. This causes your exception:

>>> '\xc3\xa5'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

For the json.dumps() function, you want to use Unicode values. If to_mark contains UTF-8 data, use str.decode():

text = to_mark.decode('utf8')
Sign up to request clarification or add additional context in comments.

2 Comments

So if I am understanding this properly, byte string is already encoded ? And undicode is a decoded version of byte string ? And I just tried it out, I am still getting the same error
Yes, you are getting there. Python strings are encoded data; unicode values are decoded. A bit like strings containing digits and integers, or strings representing a date and time versus datetime objects. See the Python Unicode HOWTO, and I'd really read Joel on Unicode and Pragmatic Unicode as well.
1

Your code snippets reads:

text = to_mark.encode('utf-8')

but in the traceback you have:

File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 43, in markDown
    text = to_mark.decode('utf8')

Please first make sure you post the real code and traceback (that is: you post the code that actually raise the exception).

1 Comment

Sorry, I did not edit it when I changed it to decode according to Martjin's suggestions. Edited now.
0

I can not remember accurately, but probably using decode/encode at response.read() worked for me when I have faced the exact same error.

response.read().decode("utf8")

1 Comment

The exception applies to a Unicode values being coerced somewhere; why would decoding from bytes to Unicode in another place solve that issue? Even if this is the issue, why would UTF8 work here? That is a big assumption to make.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.