0

I am tryng to understand how to set dtypes of an array. My original numpy array dimensions are (583760, 7) i.e. 583760 rows and 7 columns. I am setting dtype as follows

>>> allRics.shape
(583760, 7)
>>> allRics.dtype = [('idx', np.float), ('opened', np.float), ('time', np.float),('trdp1',np.float),('trdp0',np.float),('dt',np.float),('value',np.float)]
>>> allRics.shape
(583760, 1)

Why is there a change in the original shape of the array? What causes this change? I am basically trying to sort original numpy array by time column and thats why I am setting the dtype. But after the dimension change, I am not able to sort array

>>> x=np.sort(allRics,order='time')

there is no change in the output of the above command. Could you please advice?

2
  • What do you expect to see happen instead of that? Commented Sep 11, 2012 at 2:59
  • 1
    the dimensions should remain same (583760,7) and finally I should be able to sort using order='time' column. Commented Sep 11, 2012 at 3:02

2 Answers 2

3

You are turning your array into a structured array. Basically, instead of a 2D array it is now treated as a 1D array of structs. Take a look as a simpler example below:

>>> import numpy as np
>>> arr = np.array([(1,2,3),(3,4,5)])
>>> arr
array([[1, 2, 3],
       [3, 4, 5]])
>>> arr.shape
(2, 3)
>>> arr.dtype=[('a',int),('b',int),('c', int)]
>>> arr  # Notice that tuples inside the elements
array([[(1, 2, 3)],
       [(3, 4, 5)]], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr.shape
(2, 1)

The structured array not sorting is most assurdly a bug. It looks like a work around it so actually declare the array a structured array to begin with:

>>> arr_s = np.sort(arr, order='b')
>>> arr_s
array([[(1, 2, 3)],
       [(3, 4, 5)]], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> dtype=[('a',np.int64),('b',np.int64),('c', np.int64)]
>>> arr = np.array([(5,2,3),(3,4,1)], dtype=dtype)
>>> arr
array([(5, 2, 3), (3, 4, 1)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr_s = np.sort(arr, order='a')
>>> arr_s
array([(3, 4, 1), (5, 2, 3)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr_s = np.sort(arr, order='b')
>>> arr_s
array([(5, 2, 3), (3, 4, 1)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> arr_s = np.sort(arr, order='c')
>>> arr_s
array([(3, 4, 1), (5, 2, 3)], 
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
>>> 
Sign up to request clarification or add additional context in comments.

4 Comments

did you mean now instead of not ?
So that explains my output array. it seems to be correct. I don't understand why my sort function doesn't work. x=np.sort(allRics,order='time') is not sorted by time column. Am I missing anything here?
Yes now not not, also added comments on the sorting. Looks to be a bug.
Thanks for the link. I guess the issue is with the endianness of the columns which maybe causing the problem. I will try it out.
1

You might be able to avoid using structured arrays alltogether if all you are using them for is sorting. You could do something like:

new_order = np.argosrt(allRics[:, 2])
x = allRics[new_order]

1 Comment

Thanks. I already implemented the same solution. it worked for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.