Let's say, I have a string (Unicode if it matters) variable which is less than 100 bytes. I want to create another variable with exactly 100 byte in size which includes this string and is padded with zero or whatever. How would I do it in Python 3?
-
Is this for displaying the string, or for some other reason?Cody Piersall– Cody Piersall2014-06-17 18:57:05 +00:00Commented Jun 17, 2014 at 18:57
-
@CodyPiersall, I need to send fixed-size byte over network so I can assemble the packet on the other side.Mikael S.– Mikael S.2014-06-17 18:58:17 +00:00Commented Jun 17, 2014 at 18:58
-
the byte-size of a string depends on the encoding... Are you talking about strings in the sense of 'text' or strings in the sense of data b'\x00' ?Chris Wesseling– Chris Wesseling2014-06-17 18:59:24 +00:00Commented Jun 17, 2014 at 18:59
-
@ChrisWesseling, the latter suits me better I think.Mikael S.– Mikael S.2014-06-17 19:00:58 +00:00Commented Jun 17, 2014 at 19:00
-
2If you don't know what it is padded with, how can you distinguish the padding from the actual data on the receiving end?Scott Hunter– Scott Hunter2014-06-17 19:29:50 +00:00Commented Jun 17, 2014 at 19:29
4 Answers
For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct module.
Just for the string, you might not need struct, but as soon as you start also packing binary values, struct will make your life much easier.
Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.
1 Comment
Something like this should work:
st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation
Comments
To pad with null bytes you can do it the way they do it in the stdlib base64 module.
some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))
Comments
Here's a roundabout way of doing it:
>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24
So following this, an ASCII string of 100 bytes will need to be 79 characters long
>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100
This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.
def padder(strng):
TARGETSIZE = 100
padChar = "0"
curSize = sys.getsizeof(strng)
if curSize <= TARGETSIZE:
for i in range(TARGETSIZE - curSize):
strng = padChar + strng
return strng
else:
return strng # Not sure if you need to handle strings that start longer than your target, but you can do that here
3 Comments
a = "具有" is 42.sys.getsizeof try to show the size of some object in the vm's memory? That varies from platform to platform and implementation to implementation. On my machine it varies from 43 in python2.7, to 78 in python3.3 to TypeError: sys.getsizeof() not implemented on PyPy. I doubt this is what @MikaelS. was after...