How to exchange multidimensional arrays from matlab to python/numpy and back?

Question

i think this code handles n-d correctly (please verify), and i've hacked it to be polymorphic, but it's ugly. is there a better/more efficient way? i'm not sure it shares memory as much as possible. i want to avoid F ordering cuz i'm guessing numpy/python are happier with C. if i'm wrong on that, please enlighten me to any advantages of using F in this case.

specific questions (if this overall approach is best):

does array.array(.,numpy.ndarray.ravel) copy data? i suspect yes. would there be a way to share it instead? for instance, what would be the performance/memory sharing implications of using numpy.nditer(), numpy.ndarray.flat, or numpy.ndarray.flatten instead? see http://numpy-discussion.10968.n7.nabble.com/Why-ndarray-provides-four-ways-to-flatten-td38989.html
i need an array.peek method, why isn't there one?
if i can't have array.peek, is array.append as inefficient as i'm suspecting (copying the whole array again)? i could have used numpy.insert or numpy.append instead (which cleans up the code because i don't have to special case the empty array), but i believe those do copy.
are there any negative implications of F order?
i believe numpy.reshape and numpy.transpose efficiently create views without copying data -- but then, when the data is accessed, the access pattern will be all fragged up via the indirection through a bunch of layered views. doesn't this negate the whole point of numpy's contiguous memory layout in the first place?
in general, this type of information doesn't seem to be systematically laid out in any documentation i can find. do i have to resort to reading array/ndarray src?

code

function f
sz = [2 4 3 2];
d = reshape(1:prod(sz),sz)
d = to_np(d)
d = from_np(d)
end

function p = to_np(p)
sz = size(p);
p = reshape(p,[1 numel(p)]); % Conversion to Python is only supported for 1-N vectors.
p = py.numpy.array(p); % if empty, type is always set to double cuz of https://github.com/numpy/numpy/issues/6028
p = p.reshape(num2cell(fliplr(sz)));
t = 0:length(sz)-1;
t(end-[1 0]) = t(end-[0 1]);
p = p.transpose(num2cell(t));
end

function p = from_np(p)
sz = cellfun(@double,cell(p.shape));

method = 1;
switch method
    case 1
        empty = any(sz == 0);
        if empty
            p = py.numpy.insert(p,0,0); % casts 0 to p's type so we can later pop it for use with matlab's cast
        end
        p = py.array.array(p.dtype.char,p.ravel); % does this copy data -- how share memory?  any better to use py.numpy.nditer(p), p.flat, or p.flatten?
        % http://numpy-discussion.10968.n7.nabble.com/Why-ndarray-provides-four-ways-to-flatten-td38989.html
        c = p.pop(); % i can has peek?
        if ~empty
            p.append(c); % i suspect this could be very inefficient, does it copy the whole thing again?
        end
    case 2
        p = py.numpy.insert(p,p.size,0); % numpy.insert / numpy.append copy the whole array, so maybe worse than array.append?
        p = py.array.array(p.dtype.char,p); % numpy.insert already flattened p
        c = p.pop();
end

p = cast(p,'like',c);
p = reshape(p,fliplr(sz));
t = 1:length(sz);
t([1 2]) = t([2 1]);
p = permute(p,t);
end

output

d(:,:,1,1) =
     1     3     5     7
     2     4     6     8
d(:,:,2,1) =
     9    11    13    15
    10    12    14    16
d(:,:,3,1) =
    17    19    21    23
    18    20    22    24
d(:,:,1,2) =
    25    27    29    31
    26    28    30    32
d(:,:,2,2) =
    33    35    37    39
    34    36    38    40
d(:,:,3,2) =
    41    43    45    47
    42    44    46    48
d = 
  Python ndarray with properties:

           T: [1x1 py.numpy.ndarray]
        base: [1x1 py.numpy.ndarray]
      ctypes: [1x1 py.numpy.core._internal._ctypes]
       dtype: [1x1 py.numpy.dtype]
       flags: [1x1 py.numpy.flagsobj]
        flat: [1x1 py.numpy.flatiter]
        imag: [1x1 py.numpy.ndarray]
    itemsize: 8
      nbytes: 384
        ndim: 4
        real: [1x1 py.numpy.ndarray]
       shape: [1x4 py.tuple]
        size: 48
     strides: [1x4 py.tuple]
    [[[[  1.   3.   5.   7.]
       [  2.   4.   6.   8.]]

      [[  9.  11.  13.  15.]
       [ 10.  12.  14.  16.]]

      [[ 17.  19.  21.  23.]
       [ 18.  20.  22.  24.]]]


     [[[ 25.  27.  29.  31.]
       [ 26.  28.  30.  32.]]

      [[ 33.  35.  37.  39.]
       [ 34.  36.  38.  40.]]

      [[ 41.  43.  45.  47.]
       [ 42.  44.  46.  48.]]]]
d(:,:,1,1) =
     1     3     5     7
     2     4     6     8
d(:,:,2,1) =
     9    11    13    15
    10    12    14    16
d(:,:,3,1) =
    17    19    21    23
    18    20    22    24
d(:,:,1,2) =
    25    27    29    31
    26    28    30    32
d(:,:,2,2) =
    33    35    37    39
    34    36    38    40
d(:,:,3,2) =
    41    43    45    47
    42    44    46    48

This is strongly dependent on the py.numpy package, right? You might get other ideas from how scipy.io.loadmat loads .mat files. It has to work with the order=F issue. I believe cells are loaded as lists, and structures as dictionaries. — hpaulj
– hpaulj, Commented Jun 28, 2015 at 0:32
good idea to read loadmat, but i note that memory structure (i'll edit my question to make sure that emphasis comes across) may not be the same as storage structure, and certainly won't be "shared" (as in "no copies") in the way i'm looking for. :) — user1441998
– user1441998, Commented Jun 29, 2015 at 20:33
I am thinking about writing an interface to pack/read n-d array on disk in both MATLAB and Python, sharing the same protocol defined by you. Am I giving you too much work? — Patrick the Cat
– Patrick the Cat, Commented Jun 30, 2015 at 9:46
@Mai - i don't understand your question -- you're welcome to use the code posted here obviously. :) — user1441998
– user1441998, Commented Jun 30, 2015 at 10:01
Part of the reason you aren't getting more responses from numpy people is that we aren't sure where the MATLAB py interface leaves off, and familiar numpy code begins. And these days the closest I get to MATLAB is octave. So I can test things like .mat interfaces, but not py. — hpaulj
– hpaulj, Commented Jun 30, 2015 at 21:39

hpaulj · Accepted Answer · 2015-07-01 15:55:47Z

2

I'm going to tackle these questions bit by bit, edit by edit

does array.array(.,numpy.ndarray.ravel) copy data? i suspect yes. would there be a way to share it instead? for instance, what would be the performance/memory sharing implications of using numpy.nditer(), numpy.ndarray.flat, or numpy.ndarray.flatten instead? see http://numpy-discussion.10968.n7.nabble.com/Why-ndarray-provides-four-ways-to-flatten-td38989.html

np.array is the normal constructor of numpy arrays, particularlly when starting from lists. In fact I get the impression, from timings, that it converts all of its inputs to lists (if arrays) and then back, giving it most control over the interpretation and dtype of the inputs.

I haven't used np.ndarray much, but it appears to be a more basic constructor, and is suitable if you want to make array with a preexisting data buffer.

In [333]: x=np.arange(10)
In [334]: y=np.array(x)

In [335]: x.__array_interface__
Out[335]: 
{'shape': (10,),
 'version': 3,
 'strides': None,
 'typestr': '<i4',
 'descr': [('', '<i4')],
 'data': (158045400, False)}

In [336]: y.__array_interface__
Out[336]: 
{'shape': (10,),
 'version': 3,
 'strides': None,
 'typestr': '<i4',
 'descr': [('', '<i4')],
 'data': (158072688, False)}

The number under 'data' is the pointer to its data buffer. y is a copy of x, not a view.

Doing the same with x.flatten().__array_interface__ shows that it too is a copy. x.ravel().__array_interface__ is a view (same data pointer).

x.reshape((-1,1)).__array_interface__ is a view

x.shape = (-1,1) in an inplace operation - not only doesn't it copy the data, it doesn't copy the metadata.

i need an array.peek method, why isn't there one?

What is peek supposed to do? Sounds like a look-ahead function for an iterator or other serial access object.

if i can't have array.peek, is array.append as inefficient as i'm suspecting (copying the whole array again)? i could have used numpy.insert or numpy.append instead (which cleans up the code because i don't have to special case the empty array), but i believe those do copy.

np.append uses np.concatenate, which in turn does create a new array. It's ok for adding a new value in array once or twice, but new users mistakenly try to use it iteratively. Same for np.insert. Python list append is better for iterative work.

There isn't a way of adding an element to a numpy array without making a copy.

are there any negative implications of F order?

Access time when indexing 2d arrays varies depending on the axis and C v F order. Otherwise I wouldn't worry too much about it.

i believe numpy.reshape and numpy.transpose efficiently create views without copying data -- but then, when the data is accessed, the access pattern will be all fragged up via the indirection through a bunch of layered views. doesn't this negate the whole point of numpy's contiguous memory layout in the first place?

reshape and transpose change values like the shape and strides, but otherwise don't do anything to the data buffer. It is a little faster to access one contiguous block of data in the buffer, which for a C order array means something like x[3] or x[3,:]. But the striding mechanism is supposed to give reasonable speed to any regular pattern of access.

in general, this type of information doesn't seem to be systematically laid out in any documentation i can find. do i have to resort to reading array/ndarray src?

I missed the fact that half of questions focused on the Python array module, not numpy. I haven't used it, but the documentation looks straight forward. And testing:

In [420]: import array
In [421]: xa=array.array('i')  # create the array

In [422]: xa.fromlist(np.arange(10).tolist())
# can add elements in various ways, including all the list ones
# so to/from list is always an option

In [423]: xa
Out[423]: array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [424]: xa[3]  # can be indexed like a list, not need for a peek
Out[424]: 3

In [425]: xa[8:]
Out[425]: array('i', [8, 9])

ndarray is not used as much as np.array, but is useful if you already have a data buffer. And apparently an array.array qualifies:

In [426]: xn=np.ndarray((10,),int,buffer=xa)    
In [427]: xn
Out[427]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Modifying elements in xa also modify them in xn, because of the shared buffer

In [428]: xa[3]=30
In [429]: xn[3]
Out[429]: 30

Adding or deleting elements from xa is bound to mess up xn. But xn can be reshaped, transposed, etc. And inplace changes to xn (e.g. +=) are reflected in xa. And anything that makes a copy of xn looses connections with xa.

edited Jul 1, 2015 at 15:55

answered Jun 30, 2015 at 23:36

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

21 Comments

user1441998 Over a year ago

the first question is asking if there is a way to make a standard array (docs.python.org/2/library/array.html) from a numpy ndarray that shares memory. matlab can only convert from standard arrays, that's why i need that step. flattening the ndarray is only part of this step. would using nditer (or some other iterator) have exactly the same space/time efficiency as .ravel? what about .flat? you're suggesting p.shape = (-1,1); array.array(.,p) would be best of all? when i try it, i get the error that the array has to be contiguous, which means making a copy.

user1441998 Over a year ago

peek should return like pop without removing the element. again, this question and the next (about append) are regarding standard arrays, not numpy arrays.

user1441998 Over a year ago

without peek, i need to pop and append a value off a potentially large standard array. i need the value so i can use matlab's cast like to match its type. although it's only a single append, i worry it requires a copy of the whole array. again, we're talking about standard arrays here, i know numpy.append copies.

user1441998 Over a year ago

regarding F order -- it's matlab's order, so i could avoid the numpy.transpose and matlab permute steps (i think). but will this help me make fewer copies? since it changes the order of elements wrt the flat standard array, i worry other numpy functions will depend on the order -- but you're saying either order behaves the same, except for the order of elements when reshapeing?

hpaulj Over a year ago

I've done almost nothing with the standard arrays; and don't know what's involved in converting to/from numpy. Regarding F order, loadmat returns arrays with that order. There may be numpy functions that prefer C order, but it is easy to convert the order.

|

user1441998 · Accepted Answer · 2015-07-03 05:11:19Z

anytime you pass a matlab array p (must be 1xN) to a python function, matlab automatically converts it to an array.array. this won't work for 8 byte ints in python 2.7, where there is no such array.array. open questions:

is it possible for these objects to share memory?
where is the matlab src that actually does this? it seems to be purposely hidden/opaque, not even in <matlabroot>/toolbox/matlab/external/interfaces/python/+python/+internal/.

here's how to create a numpy.ndarray that shares memory with that array.array: py.numpy.ndarray(length(p),py.numpy.dtype(py.numpy(class(p))),pyargs('buffer',‌p)). mutating items shows that this memory is not shared all the way back to matlab's array. open questions:

would there be a way to share memory all the way back to matlab?
why doesn't numpy.array(array.array) use the same buffer sharing as the ndarray constructor method?
why isn't the ndarray constructor mentioned here?
the ndarray constructor needs to know the actual matlab class, just using python's int or float type (py.type(p(1))) results in incorrect interpretation of the buffer. this isn't because memory is shared all the way back to matlab -- it's because p(1) is converted to a different python type when isolated (int or float) than when part of an array (where it will be one of the array.array types b, B, h, H, i, I, f or d). for 4 byte ints, why does matlab use i/I instead of l/L?
is there a way to share memory going the other way? array.array(.,numpy.ndarray)

the way to index into array.array from matlab (for eg the last element) is py.operator.getitem(p,py.len(p)-1). for numpy.ndarray, it's p.item(p.size - 1).

why don't they inherit from collections.Sequence -- if they did, matlab could index into them using p{1}, p(1), or cell(p).

matlab keeps data in F order. open questions:

could this be used to avoid the numpy.transpose and matlab permute steps and make fewer copies?
are there any implications of its usage other than changing the order of elements wrt reshaping/flattening?

there are many ways to flatten an ndarray. which one would be fastest and make the least copies in this use case?

.flatten makes a copy -- why would anyone want that?
.reshape((-1,)) is discouraged (i think?)
.ravel makes a view -- does this make it the best choice?
does numpy.nditer (or some other iterator like .flatiter) have exactly the same space/time efficiency as .ravel?
what about .flat?
p.shape = (-1,) might be the best, since it doesn't even copy metadata (reference?), but requires a contiguous array, so in general requires making a copy. is there an approach in this use case (probably involving F order) that can guarantee we are contiguous? is it discouraged for the same reasons as .reshape((-1,))? how does it differ?

is it really true that a ton of layered views does not significantly decrease performance of an ndarray? reference?

why doesn't numpy.array preserve the type of an empty array.array? https://github.com/numpy/numpy/issues/6028

Collectives™ on Stack Overflow

How to exchange multidimensional arrays from matlab to python/numpy and back?

2 Answers 2

21 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

21 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related