4

I want to sort a string array using numpy by the length of the elements.

>>> arr = ["year","month","eye","i","stream","key","house"]
>>> x = np.sort(arr, axis=-1, kind='mergesort')
>>> print(x)
['eye' 'house' 'i' 'key' 'month' 'stream' 'year']

But it sorts them in alphanumeric order. How can I sort them using numpy by their length?

6
  • 1
    If you're doing this because you think it might be faster, you're not understanding the way numpy works. It's designed for elements that occupy the same amount of bytes. Unequal length strings do not satisfy that and will then in numpy just be wrapped to objects, which is probably even more inefficient. Commented Jun 16, 2016 at 12:15
  • 1
    While there are a lot of good reason to use numpy, i have to agree with Oliver's remarks. This code example naturally calls for python's built-in sort methods, which allow a more beautiful/compact syntax. (But maybe you have other reasons to use numpy and this was only an example). Commented Jun 16, 2016 at 12:19
  • Thank you. For now, I am new to numpy and I'm just trying to understand. @sascha So then when I have so many string elements to sort by their length, which way can I use? only python's own sort method? Commented Jun 16, 2016 at 12:27
  • Thank you @OliverW. Commented Jun 16, 2016 at 12:27
  • Use whatever fits to your use-case. There will be not much difference in performance i think. If the other parts of the code use numpy, stick to it. If not, then the usage of numpy just for sorting is awkward, because built-in sort can do this too. It's hard to give recommendations without knowing what you are doing (the use-case above doesn't really need numpy). Commented Jun 16, 2016 at 12:30

2 Answers 2

3

Add a helper array containing the lenghts of the strings, then use numpy's argsort which gives you the indices which would sort according to these lengths. Index the original data with these indices:

import numpy as np
arr = np.array(["year","month","eye","i","stream","key","house"])  # np-array needed for later indexing
arr_ = map(lambda x: len(x), arr)  # remark: py3 would work different here
x = arr[np.argsort(arr_)]
print(x)
Sign up to request clarification or add additional context in comments.

2 Comments

There's also numpy.char.str_len (but it's only slightly faster).
Because you are using python3 and ignored my comment. Use arr_ = list(map... instead of just map(... Py3's map returns an iterator, not a list/array, so we need this extra-step.
1

If I expand your list to arr1=arr*1000, the Python list sort using len as the key function is fastest.

In [77]: len(arr1)
Out[77]: 7000

In [78]: timeit sarr=sorted(arr1,key=len)
100 loops, best of 3: 3.03 ms per loop

In [79]: %%timeit
arrA=np.array(arr1)
larr=[len(i) for i in arrA]  # list comprehension works same as map
sarr=arrA[np.argsort(larr)]
   ....: 
100 loops, best of 3: 7.77 ms per loop

Converting the list to array takes about 1 ms (that conversion adds significant overhead, especially for small lists). Using an already created array, and np.char.str_len the time is still slower than Python sort.

In [83]: timeit sarr=arrA[np.argsort(np.char.str_len(arrA))]
100 loops, best of 3: 6.51 ms per loop

the np.char functions can be convenient, they still basically iterate over the list, applying the corresponding str method.

In general argsort gives you much of the same power as the key function.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.