Numpy matrix row/column memory layout

Question

Numpy nd-arrays are laid out as contiguous 1-d arrays.

This stack overflow conversation (Copying bytes in Python from Numpy array into string or bytearray) suggests that row indexing of matrices are 'views' into the original array, and that no new array objects allocated in memory when indexing.

However, I'm getting weird results when I look at address locations of individual rows of a numpy matrix. For example,

>> x = numpy.random.rand(2,2)
>> x.data
   <read-write buffer for 0x7fdc92a941c0, size 32, offset 0 at 0x7fdc92ad2d30>
>> x[0,].data
   <read-write buffer for 0x7fdc92a94a30, size 16, offset 0 at 0x7fdc929e7270>
>> x[1,].data
   <read-write buffer for 0x7fdc92a94080, size 16, offset 0 at 0x7fdc929e7030>

Using GDB, I found that the distance between the addresses of x[0,] and x[1,] was 576 bytes and the distance between the addresses of x[0,] and x was 965312 bytes:

>> (gdb) print 0x7fdc929e7030 - 0x7fdc929e7270
$1 = -576
>> (gdb) print 0x7fdc92ad2d30 - 0x7fdc929e7270
$2 = 965312

If no new array objects are created when indexing the rows of matrix x, why are these array views so far apart in memory? Does the distance between array views in memory destroy the cache performance of matrix operations that operate across rows/columns?

hpaulj · Accepted Answer · 2016-02-28 18:30:38Z

2

.__array_interface__ gives a nicer display than .data.

In [498]: x=np.random.rand(2,2)
In [499]: x.__array_interface__['data']
Out[499]: (182019952, False)
In [500]: x[0].__array_interface__['data']
Out[500]: (182019952, False)
In [501]: x[0,:].__array_interface__['data']
Out[501]: (182019952, False)
In [502]: x[1,:].__array_interface__['data']
Out[502]: (182019968, False)

These addresses are all the same or nearby.

Selecting a single item will produce a different address.

In [504]: x[0,0].__array_interface__['data']
Out[504]: (182020568, False)

This isn't a view, it's a np.float64.

answered Feb 28, 2016 at 18:30

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user2357112 · Accepted Answer · 2016-02-28 18:39:54Z

1

The at 0x7fdc92ad2d30 bit doesn't indicate where the storage for the buffer's contents is located. It indicates where the buffer object header is located - the little thing with a bunch of metadata and a pointer to the buffer's contents. The location of the buffer header is completely independent of the location of its contents.

As an example, since x.data is generated on the fly, the at 0xwherever bit can say something completely different on each access:

In [7]: x = numpy.arange(8)

In [8]: x.data
Out[8]: <read-write buffer for 0x24a6b60, size 64, offset 0 at 0x7f3a7b3438f0>

In [9]: x.data
Out[9]: <read-write buffer for 0x24a6b60, size 64, offset 0 at 0x7f3a7b343970>

answered Feb 28, 2016 at 18:39

user2357112

286k32 gold badges490 silver badges569 bronze badges

Collectives™ on Stack Overflow

Numpy matrix row/column memory layout

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related