0

Numpy nd-arrays are laid out as contiguous 1-d arrays.

This stack overflow conversation (Copying bytes in Python from Numpy array into string or bytearray) suggests that row indexing of matrices are 'views' into the original array, and that no new array objects allocated in memory when indexing.

However, I'm getting weird results when I look at address locations of individual rows of a numpy matrix. For example,

>> x = numpy.random.rand(2,2)
>> x.data
   <read-write buffer for 0x7fdc92a941c0, size 32, offset 0 at 0x7fdc92ad2d30>
>> x[0,].data
   <read-write buffer for 0x7fdc92a94a30, size 16, offset 0 at 0x7fdc929e7270>
>> x[1,].data
   <read-write buffer for 0x7fdc92a94080, size 16, offset 0 at 0x7fdc929e7030>

Using GDB, I found that the distance between the addresses of x[0,] and x[1,] was 576 bytes and the distance between the addresses of x[0,] and x was 965312 bytes:

>> (gdb) print 0x7fdc929e7030 - 0x7fdc929e7270
$1 = -576
>> (gdb) print 0x7fdc92ad2d30 - 0x7fdc929e7270
$2 = 965312

If no new array objects are created when indexing the rows of matrix x, why are these array views so far apart in memory? Does the distance between array views in memory destroy the cache performance of matrix operations that operate across rows/columns?

2 Answers 2

2

.__array_interface__ gives a nicer display than .data.

In [498]: x=np.random.rand(2,2)
In [499]: x.__array_interface__['data']
Out[499]: (182019952, False)
In [500]: x[0].__array_interface__['data']
Out[500]: (182019952, False)
In [501]: x[0,:].__array_interface__['data']
Out[501]: (182019952, False)
In [502]: x[1,:].__array_interface__['data']
Out[502]: (182019968, False)

These addresses are all the same or nearby.

Selecting a single item will produce a different address.

In [504]: x[0,0].__array_interface__['data']
Out[504]: (182020568, False)

This isn't a view, it's a np.float64.

Sign up to request clarification or add additional context in comments.

Comments

1

The at 0x7fdc92ad2d30 bit doesn't indicate where the storage for the buffer's contents is located. It indicates where the buffer object header is located - the little thing with a bunch of metadata and a pointer to the buffer's contents. The location of the buffer header is completely independent of the location of its contents.

As an example, since x.data is generated on the fly, the at 0xwherever bit can say something completely different on each access:

In [7]: x = numpy.arange(8)

In [8]: x.data
Out[8]: <read-write buffer for 0x24a6b60, size 64, offset 0 at 0x7f3a7b3438f0>

In [9]: x.data
Out[9]: <read-write buffer for 0x24a6b60, size 64, offset 0 at 0x7f3a7b343970>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.