You comment:
But isn't it strange that such a basic operation fails? Just a simple array x = numpy.zeros(3, dtype={'names':['col1', 'col2'], 'formats':['i4','f4']}) fails to delete a column with numpy.delete(x,0,1). What is the rout cause of this issue, any ideas?
np.delete isn't a basic operation. Look at it's code. It's 5 screens long (on Ipython). A lot of that handles the different ways that you can specify the delete elements.
For
np.delete(x, 0, axis=1)
it uses a special case
# optimization for a single value
...
newshape[axis] -= 1
new = empty(newshape, arr.dtype, arrorder)
slobj[axis] = slice(None, obj)
new[slobj] = arr[slobj]
slobj[axis] = slice(obj, None)
slobj2 = [slice(None)]*ndim
slobj2[axis] = slice(obj+1, None)
new[slobj] = arr[slobj2]
For a 2d array, and axis=1 it does:
new = np.zeros((x.shape[0], x.shape[1]-1), dtype=x.dtype)
new[:, :obj] = x[:, :obj]
new[:, obj:] = x[:, obj+1:]
In other words, it allocates a new array with 1 less column, and then copies two slices from the original to it.
With multiple delete columns and boolean obj it takes other routes.
Notice that fundamental to that action is the ability to index the 2 dimensions.
But you can't index your x that way. x[0,1] gives a too many indices error. You have to use x[0]['col1']. Indexing the fields of a dtype is fundamentally different from indexing the columns of a 2d array.
The recfunctions manipulate the dtype fields in ways that regular numpy functions don't. Based on previous study, I'm guessing that drop_field does something like this:
In [57]: x # your x with some values
Out[57]:
array([(1, 3.0), (2, 2.0), (3, 1.0)],
dtype=[('col1', '<i4'), ('col2', '<f4')])
Target array, with different dtype (missing one field)
In [58]: y=np.zeros(x.shape, dtype=x.dtype.descr[1:])
copy values, field by field:
In [60]: for name in y.dtype.names:
...: y[name]=x[name]
In [61]: y
Out[61]:
array([(3.0,), (2.0,), (1.0,)],
dtype=[('col2', '<f4')])
Regular n-d indexing is built around the shape and strides attributes. With these (and the element byte size) it can quickly identify the location in the data buffer of a desired element.
With a compound dtype, shape and strides work the same way, but nbytes is different. In your x case it is 24 - 12 each for the i4 and f4 fields. So regular indexing steps from one 24 bit record to the next. So to select the 'col2' field, it has take the further step of selecting the 2nd set of 4 bytes within each record.
Where possible I think it translates field selection into regular indexing. __array_interface__ is a nice dictionary of the essential attributes of an array.
In [70]: x.__array_interface__
Out[70]:
{'data': (68826112, False),
'descr': [('col1', '<i4'), ('col2', '<f4')],
'shape': (3,),
'strides': None,
'typestr': '|V8',
'version': 3}
In [71]: x['col2'].__array_interface__
Out[71]:
{'data': (68826116, False),
'descr': [('', '<f4')],
'shape': (3,),
'strides': (8,),
'typestr': '<f4',
'version': 3}
The second array points to the same data buffer, but 4 bytes further along (the first col2 value). In effect it is a view.
(np.transpose is another function that does not operate across the dtype boundary.)
===================
Here's the code for drop_fields (summarized):
In [74]: from numpy.lib import recfunctions # separate import statement
In [75]: recfunctions.drop_fields??
def drop_fields(base, drop_names, usemask=True, asrecarray=False):
.... # define `drop_descr function
newdtype = _drop_descr(base.dtype, drop_names)
output = np.empty(base.shape, dtype=newdtype)
output = recursive_fill_fields(base, output)
return output
recursive_fill_fields does a name by name field copy, and is able to handle dtypes that define fields within fields (the recursive part).
In [81]: recfunctions.drop_fields(x, 'col1')
Out[81]:
array([(3.0,), (2.0,), (1.0,)],
dtype=[('col2', '<f4')])
In [82]: x[['col2']] # multifield selection that David suggests
Out[82]:
array([(3.0,), (2.0,), (1.0,)],
dtype=[('col2', '<f4')])
In [83]: x['col2'] # single field view
Out[83]: array([ 3., 2., 1.], dtype=float32)
drop_field produces a similar result as the multifield indexing that @David suggests. However that multifield indexing is poorly developed, as you will see if you try some sort of assignment.
x = numpy.zeros(3, dtype={'names':['col1', 'col2'], 'formats':['i4','f4']})fails to delete a column withnumpy.delete(x,0,1). What is the rout cause of this issue, any ideas?