1

I performed socket communication in python2, it worked well and I have to make it works in python3 again. I have tired str.encode() stuff with many formats, but the other side of the network can't recognize what I send. The only thing I know is that the python3 str type is encoded as Unicode uft-8 in default, and I'm pretty sure the critical question in here is that what is the format of python2 str type. I have to send exactly the same thing as what was stored in python2 str. But the tricky thing is the socket of python3 only sends the encoded unicode bytes or other buffer interface, rather than the str type with the raw data in Python2. The example is as follow:

In python2:

data = 'AA060100B155'
datasplit = [fulldata[i: i+2] for i in range(0, len(fulldata), 2)]
senddata = ''
for item in datasplit:
    itemdec = chr(int(item, 16))
    senddata += itemdec
print(senddata) 
#'\xaa\x06\x01\x00\xb1U',which is the data I need

In python3, seems it can only sends the encoded bytes using "senddata.encode()", but it is not the format I want. You can try:

print(senddata.encode('latin-1'))
#b'\xaa\x06\x01\x01\xb2U'

to see the difference of two senddatas, and an interesting thing is that it is faulty encoded when using utf-8.

The data stored in Python3 str type is the thing I need, but my question is how to send the data of that string without encoding it? Or how to perform the same str type of Python2 in Python3?

Can anyone help me with this?

2
  • Where does 'ª\x06\x01\x01²U' come from? When I run your code (in Python 2.6.6) , the repr of senddata is '\xaa\x06\x01\x00\xb1U'. Commented Apr 17, 2017 at 12:50
  • sorry, It is from IDE of python3. sorry to confuse you Commented Apr 17, 2017 at 12:58

4 Answers 4

2

I performed socket communication in python2, it worked well and I have to make it works in python3 again. I have tired str.encode() stuff with many formats, but the other side of the network can't recognize what I send.

You have to make sure that whatever you send is decodable by the other side. The first step you need to take is to know what sort of encoding that network/file/socket is using. If you use UTF-8 for instance to send your encoded data and the client has ASCII encoding, this will work. But, say cp500 is the encoding scheme of your client and you send the encoded string as UTF-8, this won't work. It's better to pass the name of your desired encoding explicitly to functions, because sometimes the default encoding of your platform may not necessarily be UTF-8. You can always check the default encoding by this call sys.getdefaultencoding().

The only thing I know is that the python3 str type is encoded as Unicode uft-8 in default, and I'm pretty sure the critical question in here is that what is the format of python2 str type. I have to send exactly the same thing as what was stored in python2 str. But the tricky thing is the socket of python3 only sends the encoded unicode bytes or other buffer interface, rather than the str type with the raw data in Python2

Yes, Python 3.X uses UTF-8 as the default encoding, but this is not guaranteed in some cases the default encoding could be changed, it's better to pass the name of the desired encoding explicitly to avoid such cases. Notice though, str in Python 3.X is the equivalent of unicode + str in 2.X, but str in 2.X supports only 8-bit (1-byte) (0-255) characters.

On one hand, your problem seems with 3.X and its type distinction between str and bytes strings. For APIs that expect bytes won't accept str in 3.X as of today. This is unlike 2.X, where you can mix unicode and str freely. This distinction in 3.X makes sense, given str represents decoded strings and used for textual data. Whereas, bytes represents encoded strings as raw bytes with absolute byte values.

On the other hand, you have problem with choosing the right encoding for your text in 3.X that you need to pass to client. First check what sort of encoding does your client use. Second, pass the encoded string with the the proper encoding scheme of your client so your client can decode it properly: str.encode('same-encoding-as-client').

Because you pass your data as str in 2.X and it works, I suspect and it's most likely your client uses 8-bit encoding for characters, something like Latin-1 might be the encoding used by your client.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for your long explaination! I will have a talk with the client.
@Iwangreen Also see: Unicode HOWTO.
Ok I have one question. Is there a default encoding scheme for Python2 str? From your answer I thinks it is a no for this. We actually perform a hardware control through Python application, and there might be also no encoding scheme on the circuit board. That's probably the reason why I can do communication with Python2 str but not for Python3. Your opinion?
There's no encoding for str in Python 2.X and is simply a raw bytes string: What encoding do normal python strings use?.
Your second question: the reason why your code works in 2.X when you send str is because str is raw data. But I don't see the reason why you're not able to make your data raw and send them as bytes objects in 3.X. Though, bytes does exist in 2.X for forward-compatibility and returns simple str. Interfacing with hardware would likely require raw data of course. Isn't str in 2.X raw after all? So if you would like to to have 2.X's str in 3.X, it's just called bytes with minor differences.
2

You can convert the whole string to an integer, then use the integer method to_bytes to convert it into a bytes object:

fulldata = 'AA060100B155'

senddata = int(fulldata, 16).to_bytes(len(fulldata)//2, byteorder='big')
print(senddata)

# b'\xaa\x06\x01\x00\xb1U'

The first parameter of to_bytes is the number of bytes, the second (required) is the byteorder. See int.to_bytes in the official documentation for reference.

2 Comments

Thanks for your reply, but it doesn't solve my problem. Thanks anyway
You just have to send it, the way you sent the Python2 str before.
2

There are various ways to do this. Here's one that works in both Python 2 and Python 3.

from binascii import unhexlify

fulldata = 'AA060100B155'
senddata = unhexlify(fulldata)
print(repr(senddata))

Python 2 output

'\xaa\x06\x01\x00\xb1U'

Python 3 output

b'\xaa\x06\x01\x00\xb1U'

3 Comments

It is not the matter about what I send, it is all about what the other side can recognize. do you know what's the difference between '\xaa\x06\x01\x00\xb1U' and a 'b' in front of it?
@lwangreen In Python 2, there's no difference. In Python 3, b'\xaa\x06\x01\x00\xb1U' is a bytes string, contains exactly the same bytes as Python 2's b'\xaa\x06\x01\x00\xb1U' or '\xaa\x06\x01\x00\xb1U'. However, '\xaa\x06\x01\x00\xb1U' in Python 3 is the same as u'\xaa\x06\x01\x00\xb1U' (in either Python 2 or Python 3). And you can convert that to the previous bytes string using u'\xaa\x06\x01\x00\xb1U'.encode('latin-1'). That's because Latin-1 is a subset of Unicode.
@lwangreen With fulldata = 'AA060100B155' your Python 2 code sends '\xaa\x06\x01\x00\xb1U'. So if your Python 3 code sends the bytes string b'\xaa\x06\x01\x00\xb1U' over the socket they will get exactly the same bytes.
0

The following is Python 2/3 compatible. The unhexlify function converts hexadecimal notation to bytes. Use a byte string and you don't have to deal with Unicode strings. Python 2 is byte strings by default, but recognizes the b'' syntax that Python 3 requires to use a byte string.

from binascii import unhexlify
fulldata = b'AA060100B155'
print(repr(unhexlify(fulldata)))

Python 2 output:

'\xaa\x06\x01\x00\xb1U'

Python 3 output:

b'\xaa\x06\x01\x00\xb1U'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.