sort numpy array of lists

Question

We are given a numpy array (ndarray) of lists (dtype=object), and would like to return a similar array of lists where each list is sorted. Is there an efficient way of doing this (i.e. without for loop etc)?

Please don't offer np.vectorize() as a solution as this is implemented as a for loop and is thus inefficient.

For example:

a=np.array([[5,4],[6,7,2],[8,1,9]],dtype=object)

so a is:

array([list([5, 4]), list([6, 7, 2]), list([8, 1, 9])], dtype=object)

and we would like the function to sort it so we would get:

array([list([4, 5]), list([2, 6, 7]), list([1, 8, 9])], dtype=object)

Numpy's speed comes from homogeneous rectangular arrays, not magic. Numpy won't be faster for your problem, unless np.sort is faster than timsort for your individual lists. — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Dec 8, 2019 at 12:21
@AndrasDeak I disagree, the speed comes also comes from vectorizing by pushing loops into the c, and that does not necessarily require rectangular arrays. — Bitwise
– Bitwise, Commented Dec 8, 2019 at 12:23
Python's sort is also written in C. A lot of numpy speedup comes from caching in the CPU which is gone if your array only contains pointers to data far apart. — Andras Deak -- Слава Україні
– Andras Deak -- Слава Україні, Commented Dec 8, 2019 at 12:25
frompyfunc often is twice as fast as iteration when working with the elements of a object array. — hpaulj
– hpaulj, Commented Dec 8, 2019 at 16:16

EliadL · Accepted Answer · 2019-12-09 09:11:44Z

4

Your example, and an expanded version for time tests:

In [202]: a=np.array([[5,4],[6,7,2],[8,1,9]],dtype=object)                      
In [203]: A = a.repeat(100)

Applying Python list sort to each element:

In [204]: np.array([sorted(i) for i in a])                                      
Out[204]: array([list([4, 5]), list([2, 6, 7]), list([1, 8, 9])], dtype=object)

Using frompyfunc to do the same:

In [205]: np.frompyfunc(sorted,1,1)(a)                                          
Out[205]: array([list([4, 5]), list([2, 6, 7]), list([1, 8, 9])], dtype=object)

Some timings:

In [206]: timeit np.array(list(map(sorted, A)))                                 
168 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [207]: timeit np.array([sorted(i) for i in A])                               
181 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

map is a little faster than a list comprehension. I prefer the readability of the comprehension.

A pure list version is quite a bit faster:

In [208]: %%timeit temp=A.tolist() 
     ...: list(map(sorted, temp)) 
     ...:  
     ...:                                                                       
88.3 µs ± 70.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

frompyfunc is faster than the array map, and almost as good as the pure list version:

In [209]: timeit np.frompyfunc(sorted,1,1)(A)                                   
97.3 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

That's the pattern I've seen before. frompyfunc is the fastest way to apply functions to elements of an object dtype array, but it seldom is better than a list based iteration.

edited Dec 9, 2019 at 9:11

EliadL

7,2283 gold badges30 silver badges45 bronze badges

answered Dec 8, 2019 at 18:22

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Bitwise Over a year ago

Thanks, very interesting. So how much of the difference between np.array(list(map(sorted, A))) and the pure python version comes from the need to convert the list back to a numpy ndarray at the end?

hpaulj Over a year ago

Most of the difference lies in that conversion back to array. Starting with a list helps some, but not nearly as much.

Bitwise Over a year ago

Great, thanks. Unforunately I need this to end up as a numpy array (for other fast operations), so I guess there is no getting around this.

Andras Deak -- Слава Україні Over a year ago

@Bitwise heads-up: those "other fast operations" might also be faster in native python.

Bitwise Over a year ago

@AndrasDeak thanks, in a general case that might be true, but unfortunately that is not the case for me.

|

EliadL · Accepted Answer · 2019-12-08 12:45:04Z

1

np.array(list(map(sorted, a)))

gives:

array([list([4, 5]), list([2, 6, 7]), list([1, 8, 9])], dtype=object)

answered Dec 8, 2019 at 12:45

EliadL

7,2283 gold badges30 silver badges45 bronze badges

1 Comment

Andras Deak -- Слава Україні Over a year ago

Good solution, but not numpy.

Collectives™ on Stack Overflow

sort numpy array of lists

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related