split utf-8 string into bytes in python

Question

I am trying to split an UTF-8 string into bytes in python 3. The problem is, when I use bytearray, byte, encode etc functions I always get an array with size of element 14 bytes, not 1 byte as I expected. I need to split any text file into sequence of bytes and send them byte after byte using sockets. I tried something like this:

infile = open (file, "r")
str = infile.read()
byte_str = bytes(str, 'UTF-8')
print("size of byte_str",sys.getsizeof(byte_str[0]))

Print gives me 14, but I need 1... Any suggestion?

You could open the file with rb to get a bytes object from read. — tynn
– tynn, Commented Apr 23, 2015 at 9:00

Łukasz Rogalski · Accepted Answer · 2015-04-23 08:49:10Z

Quoting official documentation:

sys.getsizeof(object[, default])

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

If given, default will be returned if the object does not provide means to retrieve the size. Otherwise a TypeError will be raised.

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

Collectives™ on Stack Overflow

split utf-8 string into bytes in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related