Python Requests encoding POST data

Question

Version: Python 2.7.3

Other libraries: Python-Requests 1.2.3, jinja2 (2.6)

I have a script that submits data to a forum and the problem is that non-ascii characters appear as garbage. For instance a name like André Téchiné comes out as AndrÃ© TÃ©chinÃ©.

Here's how the data is submitted:

1) Data is initially loaded from a UTF-8 encoded CSV file like so:

entries = []
with codecs.open(filename, 'r', 'utf-8') as f:
    for row in unicode_csv_reader(f.readlines()[1:]):
        entries.append(dict(zip(csv_header, row)))

unicode_csv_reader is from the bottom of Python CSV documentation page: http://docs.python.org/2/library/csv.html

When I type the entries name in the interpreter, I see the name as u'Andr\xe9 T\xe9chin\xe9'.

2) Next I render the data through jinja2:

tpl = tpl_env.get_template(u'forumpost.html')
rendered = tpl.render(entries=entries)

When I type the name rendered in the interpreter I see again the same: u'Andr\xe9 T\xe9chin\xe9'

Now, if I write the rendered variable to a filename like this, it displays correctly:

with codecs.open('out.txt', 'a', 'utf-8') as f:
    f.write(rendered)

But I must send it to the forum:

3) In the POST request code I have:

params = {u'post': rendered}
headers = {u'content-type': u'application/x-www-form-urlencoded'}
session.post(posturl, data=params, headers=headers, cookies=session.cookies)

session is a Requests session.

And the name is displayed broken in the forum post. I have tried the following:

Leave out headers
Encode rendered as rendered.encode('utf-8') (same result)
rendered = urllib.quote_plus(rendered) (comes out as all %XY)

If I type rendered.encode('utf-8') I see the following:

'Andr\xc3\xa9 T\xc3\xa9chin\xc3\xa9'

How could I fix the issue? Thanks.

jfs · Accepted Answer · 2013-07-02 08:15:24Z

32

Your client behaves as it should e.g. running nc -l 8888 as a server and making a request:

import requests

requests.post('http://localhost:8888', data={u'post': u'Andr\xe9 T\xe9chin\xe9'})

shows:

POST / HTTP/1.1
Host: localhost:8888
Content-Length: 33
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/1.2.3 CPython/2.7.3

post=Andr%C3%A9+T%C3%A9chin%C3%A9

You can check that it is correct:

>>> import urllib
>>> urllib.unquote_plus(b"Andr%C3%A9+T%C3%A9chin%C3%A9").decode('utf-8')
u'Andr\xe9 T\xe9chin\xe9'

check the server decodes the request correctly. You could try to specify the charset:
```
headers = {"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"}
```
the body contains only ascii characters so it shouldn't hurt and the correct server would ignore any parameters for x-www-form-urlencoded type anyway. Look for gory details in URL-encoded form data
check the issue is not a display artefact i.e., the value is correct but it displays incorrectly

edited Jul 2, 2013 at 8:15

answered Jul 2, 2013 at 6:50

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

TheMagician Over a year ago

"check the issue is not a display artefact i.e., the value is correct but it displays incorrectly" - Thank you. That's the problem! Unfortunately it's a public forum and I can't change the default encoding. It responds with iso-8859-1 encoding. Can I use rendered.encode('iso-8859-1') or will that break things? Thanks.

jfs Over a year ago

try to set charset in the headers

TheMagician Over a year ago

Sending it as rendered.encode('iso-8859-1') seemed to work so I'll use that. I marked your answer as correct as it pointed to the right direction. Thanks.

Micah Smith Over a year ago

To anyone else who finds this, you can use urllib.parse.quote_from_bytes and urllib.parse.unquote_to_bytes to send a bytes-type over a network without worrying as much about encoding.

jfs Over a year ago

@MicahSmith: the question has python-2.7 tag. There is no urllib.parse there. Anyway, the input is Unicode (as it should -- use Unicode to represent text inside your programs). Side-note: unquote_plus() is used here to convience OP that requests.post() works correctly -- you do not use it in your actual code.

|

dikkini · Accepted Answer · 2013-07-02 06:11:12Z

2

Try to decode into utf8:

unicode(my_string_variable, "utf8")

or decode and encode:

sometext = gettextfromsomewhere().decode('utf-8')
env = jinja2.Environment(loader=jinja2.PackageLoader('jinjaapplication', 'templates'))
template = env.get_template('mypage.html')
print template.render( sometext = sometext ).encode('utf-8')

answered Jul 2, 2013 at 6:11

dikkini

1,1921 gold badge23 silver badges54 bronze badges

Collectives™ on Stack Overflow

Python Requests encoding POST data

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related