3

I'm having trouble sending an html code through JSON.

I'm noticing my string values are different between python versions (2.7 and 3.5)

My string being something like: <html><p>PAÇOCA</p></html>

on Python 2.7:

x = '<html><p>PAÇOCA</p></html>'
base64.b64encode(x)
=> PGh0bWw+PHA+UEGAT0NBPC9wPjwvaHRtbD4=

on Python 3.5:

x = '<html><p>PAÇOCA</p></html>'
base64.b64encode(x)
=> b'PGh0bWw+PHA+UEHDh09DQTwvcD48L2h0bWw+'

Why are these values different? How can I make the 3.5 string equal to the 2.7?

This is causing me troubles with receiving e-mails due to the accents being lost.

3
  • PGh0bWw+PHA+UEGAT0NBPC9wPjwvaHRtbD4= is the [ cp437, cp850, cp852, cp857, cp858, cp860, cp861, cp863 or cp865 ] + base64 encoding of <html><p>PAÇOCA</p></html> Commented Jun 23, 2017 at 19:31
  • PGh0bWw+PHA+UEHDh09DQTwvcD48L2h0bWw+ is the UTF-8 + base64 encoding of <html><p>PAÇOCA</p></html>. Commented Jun 23, 2017 at 19:37
  • (I can't help further since I don't know how Python handle character encodings.) Commented Jun 23, 2017 at 19:37

1 Answer 1

2

Your example x values are not valid Python so it is difficult to tell where the code went wrong, but the answer is to use Unicode strings and explicitly encode them to get consistent answers. The below code gives the same answer in Python 2 and 3, although Python 3 decorates byte strings with b'' when printed. Save the source file in the encoding declared via #coding. The source code encoding can be any encoding that supports the characters used in the source file. Typically UTF-8 is used for non-ASCII source code, but I made it deliberately different to show it doesn't matter.

#coding:cp1252
from __future__ import print_function
import base64
x = u'<html><p>PAÇOCA</p></html>'.encode('utf8')
enc = base64.b64encode(x)
print(enc)

Output using Pylauncher to choose the major Python version:

C:\>py -2 test.py
PGh0bWw+PHA+UEHDh09DQTwvcD48L2h0bWw+

C:\>py -3 test.py
b'PGh0bWw+PHA+UEHDh09DQTwvcD48L2h0bWw+'
Sign up to request clarification or add additional context in comments.

4 Comments

I think this answer is partially correct. There might be some cases where the base64 encoding used in python2 is saved in the database. In that case we need a way to get the same base64 encoding as python3. Is there any way to do so?
@FareesHussain They are the same. Python 3 just displays byte strings with b'' as an indicator that it is a byte string, but the results in both Python versions are byte strings.
What I mean is to get the PGh0bWw+PHA+UEGAT0NBPC9wPjwvaHRtbD4= base64 encoded value using python3
@FareesHussain .decode() will convert it to a Unicode string, which will print without the bytes notation on Python 3.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.