Python: Zero Copy while truncating a byte buffer

Question

This is a noob question on Python.

Is there a way in Python to truncate off few bytes from the begining of bytearray and achieve this without copying the content to another memory location? Following is what I am doing:

inbuffer = bytearray()
inbuffer.extend(someincomingbytedata)
x = inbuffer[0:10]
del inbuffer[0:10]

I need to retain the truncated bytes (referenced by x) and perform some operation on it.

will x point to the same memory location as inbuffer[0] or will the 3rd line in the above code make a copy of data. Also, if the copy is not made, will deleting in the last line also delete the data referenced by x? Since x is still referencing that data, GC should not be reclaiming it. Is that right?

Edit:

If this is not the right way to truncate a byte buffer and return the truncated bytes without copying, is there any other type that supports such operation safely?

Nikratio · Accepted Answer · 2015-08-26 18:06:51Z

In your example, x will be a new object that holds a copy of the contents of inbuffer[0:10].

To get a representation without copying, you need to use a memoryview (available only in Python 3):

inbuffer_view = memoryview(inbuffer)
prefix = inbuffer_view[0:10]
suffix = inbuffer_view[10:]

Now prefix will point to the first 10 bytes of inbuffer, and suffix will point to the remaining contents of inbuffer. Both objects keep an internal reference to inbuffer, so you do not need to explicitly keep references to inbuffer or inbuffer_view.

Note that both prefix and suffix will be memoryviews, not bytearrays or bytes. You can create bytes and bytearrays from them, but at that point the contents will be copied.

memoryviews can be passed to any function that works with objects that implement the buffer protocol. So, for example, you can write them directly into a file using fh.write(suffix).

Blckknght · Accepted Answer · 2014-07-24 09:54:33Z

0

You can use the iterator protocol and itertools.islice to pull the first 10 values out of your someincomingbytedata iterable before putting the rest into inbuffer. This doesn't use the same memory for all the bytes, but it's about as good as you can get at avoiding unnecessary copying with a bytearray:

import itertools

it = iter(someincomingbytedata)
x = bytearray(itertools.islice(it, 10)) # consume the first 10 bytes
inbuffer = bytearray(it)                # consume the rest

If you really do need to do your reading all up front and then efficiently view various slices of it without copying, you might consider using numpy. If you load your data into a numpy array, any slices you take later will be views into the same memory:

import numpy as np

inbuffer = np.array(someincomingdata, dtype=np.uint8)  # load data into an array of bytes
x = inbuffer[:10]  # grab a view of the first ten bytes, which does not require a copy
inbuffer = inbuffer[10:]  # change inbuffer to reference a slice; no copying here either

answered Jul 24, 2014 at 9:54

Blckknght

106k11 gold badges135 silver badges188 bronze badges

5 Comments

Vivek Madani Over a year ago

What I am trying to achieve is the ability to save entire incoming data first into a buffer and then at some later point of time be able to truncate first N bytes and return that reference without having to copy the truncated bytes on to some other memory location let me take a look at numpy - that looks like what I want to achieve.

Vivek Madani Over a year ago

Was wondering what value does numpy add here? Can't I just do: inbuffer = bytearray(someincomingdata) x = inbuffer[:10] # grab a view of the first ten bytes, which does not require a copy inbuffer = inbuffer[10:] # change inbuffer to reference a slice; no copying here either Am I missing anything?

Blckknght Over a year ago

Both slicing from a bytearray and deleting elements from anywhere other than the end require an O(N) copy of the data. numpy returns a "view" of the fixed data when you slice an array, which is exactly what you were asking for.

Vivek Madani Over a year ago

Just so that I understand this correctly, if I am changing inbuffer to reference the slice starting from offset 10 (last statement in your code above) - will the first ten bytes be GCed when x goes out of scope?

Blckknght Over a year ago

@VivekMadani: I suspect they will not be cleaned up, but I'm not actually sure how numpy's data sharing plays with the garbage collector. It may have heuristics that depend on the amount of data involved (e.g. 10 bytes: no; 10MB: yes).

freakish · Accepted Answer · 2014-07-24 10:02:28Z

0

It is very easy to check:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> x = inbuffer[0:2]
>>> print id(x) == id(inbuffer)
False

So it is not the same object.

Also you are asking about x pointing at inbuffer[0]. You seem to misunderstand something. Arrays in Python don't work the same way as arrays in C. The address of inbuffer is not the address of inbuffer[0]:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer) == id(inbuffer[0])
False

These are wrappers around C-level arrays.

Also in Python everything is an object. And Python caches all integers up to 256 (the range of bytearray). Therefore the only thing that is copied over is pointers:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer[0]) == id(1)
True

edited Jul 24, 2014 at 10:02

answered Jul 24, 2014 at 9:18

freakish

57k12 gold badges141 silver badges181 bronze badges

2 Comments

Vivek Madani Over a year ago

Won't id() return a unique identifier for any new object created - Python doc says that this does not necessarily refer to the memory referenced by the object unless it is CPython.

freakish Over a year ago

@VivekMadani It doesn't matter how it is implemented. Note that the only property of id() I've used is that it is unique on objects (at least while they are simultaneously alive). Thus these two objects cannot occupy the same place in memory.

Collectives™ on Stack Overflow

Python: Zero Copy while truncating a byte buffer

3 Answers 3

Comments

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related