1

This is a noob question on Python.

Is there a way in Python to truncate off few bytes from the begining of bytearray and achieve this without copying the content to another memory location? Following is what I am doing:

inbuffer = bytearray()
inbuffer.extend(someincomingbytedata)
x = inbuffer[0:10]
del inbuffer[0:10]

I need to retain the truncated bytes (referenced by x) and perform some operation on it.

will x point to the same memory location as inbuffer[0] or will the 3rd line in the above code make a copy of data. Also, if the copy is not made, will deleting in the last line also delete the data referenced by x? Since x is still referencing that data, GC should not be reclaiming it. Is that right?

Edit:

If this is not the right way to truncate a byte buffer and return the truncated bytes without copying, is there any other type that supports such operation safely?

3 Answers 3

1

In your example, x will be a new object that holds a copy of the contents of inbuffer[0:10].

To get a representation without copying, you need to use a memoryview (available only in Python 3):

inbuffer_view = memoryview(inbuffer)
prefix = inbuffer_view[0:10]
suffix = inbuffer_view[10:]

Now prefix will point to the first 10 bytes of inbuffer, and suffix will point to the remaining contents of inbuffer. Both objects keep an internal reference to inbuffer, so you do not need to explicitly keep references to inbuffer or inbuffer_view.

Note that both prefix and suffix will be memoryviews, not bytearrays or bytes. You can create bytes and bytearrays from them, but at that point the contents will be copied.

memoryviews can be passed to any function that works with objects that implement the buffer protocol. So, for example, you can write them directly into a file using fh.write(suffix).

Sign up to request clarification or add additional context in comments.

Comments

0

You can use the iterator protocol and itertools.islice to pull the first 10 values out of your someincomingbytedata iterable before putting the rest into inbuffer. This doesn't use the same memory for all the bytes, but it's about as good as you can get at avoiding unnecessary copying with a bytearray:

import itertools

it = iter(someincomingbytedata)
x = bytearray(itertools.islice(it, 10)) # consume the first 10 bytes
inbuffer = bytearray(it)                # consume the rest

If you really do need to do your reading all up front and then efficiently view various slices of it without copying, you might consider using numpy. If you load your data into a numpy array, any slices you take later will be views into the same memory:

import numpy as np

inbuffer = np.array(someincomingdata, dtype=np.uint8)  # load data into an array of bytes
x = inbuffer[:10]  # grab a view of the first ten bytes, which does not require a copy
inbuffer = inbuffer[10:]  # change inbuffer to reference a slice; no copying here either

5 Comments

What I am trying to achieve is the ability to save entire incoming data first into a buffer and then at some later point of time be able to truncate first N bytes and return that reference without having to copy the truncated bytes on to some other memory location let me take a look at numpy - that looks like what I want to achieve.
Was wondering what value does numpy add here? Can't I just do: inbuffer = bytearray(someincomingdata) x = inbuffer[:10] # grab a view of the first ten bytes, which does not require a copy inbuffer = inbuffer[10:] # change inbuffer to reference a slice; no copying here either Am I missing anything?
Both slicing from a bytearray and deleting elements from anywhere other than the end require an O(N) copy of the data. numpy returns a "view" of the fixed data when you slice an array, which is exactly what you were asking for.
Just so that I understand this correctly, if I am changing inbuffer to reference the slice starting from offset 10 (last statement in your code above) - will the first ten bytes be GCed when x goes out of scope?
@VivekMadani: I suspect they will not be cleaned up, but I'm not actually sure how numpy's data sharing plays with the garbage collector. It may have heuristics that depend on the amount of data involved (e.g. 10 bytes: no; 10MB: yes).
0

It is very easy to check:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> x = inbuffer[0:2]
>>> print id(x) == id(inbuffer)
False

So it is not the same object.

Also you are asking about x pointing at inbuffer[0]. You seem to misunderstand something. Arrays in Python don't work the same way as arrays in C. The address of inbuffer is not the address of inbuffer[0]:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer) == id(inbuffer[0])
False

These are wrappers around C-level arrays.

Also in Python everything is an object. And Python caches all integers up to 256 (the range of bytearray). Therefore the only thing that is copied over is pointers:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer[0]) == id(1)
True

2 Comments

Won't id() return a unique identifier for any new object created - Python doc says that this does not necessarily refer to the memory referenced by the object unless it is CPython.
@VivekMadani It doesn't matter how it is implemented. Note that the only property of id() I've used is that it is unique on objects (at least while they are simultaneously alive). Thus these two objects cannot occupy the same place in memory.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.