0

I've got a numpy array that contains some numbers and strings in separate columns:

a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)

print(a[a[:, 0].argsort()])

However, when try to sort it based on the first column using .argsort() it's sorted in string order not numeric order.

[['1e-05' 'C']
 ['2' 'B']
 ['3e-05' 'A']]

How do I go about getting the array to sort in numeric order based on the first column?

5
  • Does this answer your question? Sorting arrays in NumPy by column Commented Jan 4, 2023 at 10:21
  • 1
    While your list of lists contains numbers and strings, the array you made from it is just strings That should be clear from the sorted output. To get a numeric sort, you need numbers, not just strings that look like numbers. Have you considered using the Python sort with key Commented Jan 4, 2023 at 16:38
  • @CarlosHorn Not quite -- that solution works if none of the numbers in the array are in e-notation. Commented Jan 5, 2023 at 2:21
  • 1
    I edited your title because the key here is that the numpy array was created using floats and strings, and was converted to an array of strings. BTW "e-notation" is nothing special. It just denotes a regular float a*(10**b) as aEb. The numbers themselves are still the same floating-point numbers. Commented Jan 5, 2023 at 2:34
  • If you are not forced to use numpy, I would recommend to use pandas which IMHO is a better choice for representing data of various types. See pandas.pydata.org/pandas-docs/stable/reference/api/… to solve your sorting problem. Commented Jan 5, 2023 at 8:16

2 Answers 2

3

In this case, a is an array of strings, as evidenced by a.dtype being '<U32'. Therefore, a[:, 0].argsort() will sort the column in lexical order.

To sort a column as numbers, it needs to be converted to numbers first, by calling .astype before .argsort:

a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)

print(a[a[:, 0].astype(float).argsort()])

Output:

[['1e-05' 'C']
 ['3e-05' 'A']
 ['2' 'B']]
Sign up to request clarification or add additional context in comments.

Comments

1

If you have control over the creation of the array, you could create a structured array instead of a regular array.

dtypes = [('value', np.float64), ('label', '<U32')]

a = np.array( [( 3e-05, 'A' ),
               ( 2, 'B' ),
               ( 1e-05, 'C' )], dtype=dtypes)

Now, a is a structured array with separate dtypes for the first and second columns -- the first column is an array of floats, and the second column is an array of strings.

Note that the array is defined as a list of tuples. This is important: defining it as a list of lists and then specifying dtype=dtypes won't work.

Now, you can sort by a column like so:

a_sorted = np.sort(a, order=['value'])

which gives:

array([(1.e-05, 'C'), (3.e-05, 'A'), (2.e+00, 'B')],
      dtype=[('value', '<f8'), ('label', '<U32')])

You can get a row or column of this structured array like so:

>>> a_sorted[0]
(1.e-05, 'C')

>>> a_sorted['value']
array([1.e-05, 3.e-05, 2.e+00])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.