7

I have two lists, x and y, and I want to sort x and permute y by the permutation of x-sorting. For example, given

x = [4, 2, 1, 3]
y = [40, 200, 1, 30]

I want to get

x_sorted = [1,2,3,4]
y_sorted = [1, 200, 30, 40]

As discussed in past questions, a simple way to solve this is

x_sorted, y_sorted = zip(*sorted(zip(x,y)))

Here is my question: What is the FASTEST way to do this?


I have three methods to do the task.

import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)

method 1:

x_sorted, y_sorted = zip(*sorted(zip(x,y))) #1.08 ms 

method 2:

foo = zip(x,y)
foo.sort()
zip(*foo)       #1.05 ms

method 3;

ind = range(1000)
ind.sort(key=lambda i:x[i])
x_sorted = [x[i] for i in ind]
y_sorted = [y[i] for i in ind]  #934us

Is there a better method, which executes faster than above three methods?


Additional questions.

  1. Why method 2 is not faster than method 1 though it uses sort method?
  2. If I execute method 2 separately, it is faster. In IPython terminal,

I have

%timeit foo = zip(x,y)   #1000 loops, best of 3: 220 us per loop
%timeit foo.sort()       #10000 loops, best of 3: 78.9 us per loop
%timeit zip(*foo)        #10000 loops, best of 3: 73.8 us per loop

3 Answers 3

7

Using numpy.argsort:

>>> import numpy as np
>>> x = np.array([4,2,1,3])
>>> y = np.array([40,200,1,30])
>>> order = np.argsort(x)
>>> x_sorted = x[order]
>>> y_sorted = y[order]
>>> x_sorted
array([1, 2, 3, 4])
>>> y_sorted
array([  1, 200,  30,  40])

>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.030632019043

NOTE

This makes sense if input data are already numpy arrays.

Sign up to request clarification or add additional context in comments.

1 Comment

great, obvious winner here :)
5

You're not timing this properly

%timeit foo.sort()

After the 1st loop, it's already sorted for the remainder. Timsort is very efficient for presorted lists.

I was a little surprised that @Roman's use of a key function was so much faster. You can improve on this further by using itemgetter

from operator import itemgetter
ig0 = itemgetter(0)
zip(*sorted(zip(x, y), key=ig0))

This is about 9% faster than using a lambda function for lists of 1000 elements

1 Comment

great, checked your solution, it gives me 0.7580892901514744, +1 for you
4
>>> x = [4, 2, 1, 3]
>>> y = [40, 200, 1, 30]    
>>> x_sorted, y_sorted = zip(*sorted(zip(x, y), key=lambda a:a[0]))
>>> x_sorted
(1, 2, 3, 4)
>>> y_sorted
(1, 200, 30, 40)

Performance:

>>> timeit('foo = zip(x,y); foo.sort(); zip(*foo)', 'from __main__ import x, y', number=1000)
1.0197240443760691
>>> timeit('zip(*sorted(zip(x,y)))', 'from __main__ import x, y', number=1000)
1.0106219310922597
>>> timeit('ind = range(1000); ind.sort(key=lambda i:x[i]); x_sorted = [x[i] for i in ind]; y_sorteds = [y[i] for i in ind]', 'from __main__ import x, y', number=1000)
0.9043525504607857
>>> timeit('zip(*sorted(zip(x, y), key=lambda a:a[0]))', 'from __main__ import x, y', number=1000)
0.8288150863453723

To see the full picture:

>>> timeit('sorted(x)', 'from __main__ import x, y', number=1000)
0.40415491505723367            # just getting sorted list from x
>>> timeit('x.sort()', 'from __main__ import x, y', number=1000)
0.008009909448446706           # sort x inplace

@falsetru method - fastest for np.arrays

>>> timeit('order = np.argsort(x); x_sorted = x[order]; y_sorted = y[order]', 'from __main__ import x, y, np', number=1000)
0.05441799872323827

As @AshwiniChaudhary suggested in comments, for lists there's a way to speed it up by using itertools.izip instead of zip:

>>> timeit('zip(*sorted(izip(x, y), key=itemgetter(0)))', 'from __main__ import x, y;from operator import itemgetter;from itertools import izip', number=1000)
0.4265049757161705

8 Comments

You can use itertools.izip for inner zip to make it memory efficient.
@AshwiniChaudhary checked :)
Don't use izip outside of sorted as it returns an iterator not list.
ok, but could it be useful if we need to get just y_sorted, like y_sorted = [k[1] for k in izip(*sorted(izip(x, y), key=itemgetter(0)))]?
But that way we can only get y_sorted not x_Sorted.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.