7

Let's say, I have a string (Unicode if it matters) variable which is less than 100 bytes. I want to create another variable with exactly 100 byte in size which includes this string and is padded with zero or whatever. How would I do it in Python 3?

9
  • Is this for displaying the string, or for some other reason? Commented Jun 17, 2014 at 18:57
  • @CodyPiersall, I need to send fixed-size byte over network so I can assemble the packet on the other side. Commented Jun 17, 2014 at 18:58
  • the byte-size of a string depends on the encoding... Are you talking about strings in the sense of 'text' or strings in the sense of data b'\x00' ? Commented Jun 17, 2014 at 18:59
  • @ChrisWesseling, the latter suits me better I think. Commented Jun 17, 2014 at 19:00
  • 2
    If you don't know what it is padded with, how can you distinguish the padding from the actual data on the receiving end? Commented Jun 17, 2014 at 19:29

4 Answers 4

7

For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct module.

Just for the string, you might not need struct, but as soon as you start also packing binary values, struct will make your life much easier.

Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.

Sign up to request clarification or add additional context in comments.

1 Comment

Although this is not directly an answer to my question but that's exactly what I wanted. Thanks.
7

Something like this should work:

st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'

Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation

Comments

3

To pad with null bytes you can do it the way they do it in the stdlib base64 module.

some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))

Comments

1

Here's a roundabout way of doing it:

>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24

So following this, an ASCII string of 100 bytes will need to be 79 characters long

>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100

This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.

def padder(strng):
    TARGETSIZE = 100
    padChar = "0"

    curSize = sys.getsizeof(strng)

    if curSize <= TARGETSIZE:
        for i in range(TARGETSIZE - curSize):
            strng = padChar + strng

        return strng
    else:
        return strng  # Not sure if you need to handle strings that start longer than your target, but you can do that here

3 Comments

Thanks but the size of a = "具有" is 42.
Doesn't sys.getsizeof try to show the size of some object in the vm's memory? That varies from platform to platform and implementation to implementation. On my machine it varies from 43 in python2.7, to 78 in python3.3 to TypeError: sys.getsizeof() not implemented on PyPy. I doubt this is what @MikaelS. was after...
@ChrisWesseling: Yeah. As if it's not bad enough that the answer's own text starts with "Here's a roundabout way of doing it", that way turns out to be completely wrong anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.