Python sort parallel arrays in place?

Question

Is there an easy (meaning without rolling one's own sorting function) way to sort parallel lists without unnecessary copying in Python? For example:

foo = range(5)
bar = range(5, 0, -1)
parallelSort(bar, foo)
print foo # [4,3,2,1,0]
print bar # [1,2,3,4,5]

I've seen the examples using zip but it seems silly to copy all your data from parallel lists to a list of tuples and back again if this can be easily avoided.

What do you think this parallelSort would do? It appears from your comments that it sort foo in decreasing order and bar in increasing order -- is that right? — user97370
– user97370, Commented Feb 8, 2010 at 15:55
What will parallelSort gives if initially foo is [2,4,6,10,8] and bar is [3,7,9,5,1]? — kennytm
– kennytm, Commented Feb 8, 2010 at 15:58
@Kenny: bar is now sorted and equals [1,3,5,7,9]. foo is manipulated in lockstep and now equals [8,2,10,4,6]. — dsimcha
– dsimcha, Commented Feb 8, 2010 at 15:59

Nick Johnson · Accepted Answer · 2010-02-09 08:49:34Z

6

Here's an easy way:

perm = sorted(xrange(len(foo)), key=lambda x:foo[x])

This generates a list of permutations - the value in perm[i] is the index of the ith smallest value in foo. Then, you can access both lists in order:

for p in perm:
  print "%s: %s" % (foo[p], bar[p])

You'd need to benchmark it to find out if it's any more efficient, though - I doubt it makes much of a difference.

edited Feb 9, 2010 at 8:49

answered Feb 8, 2010 at 18:19

Nick Johnson

101k17 gold badges131 silver badges198 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Chris Lutz Over a year ago

Change range to xrange if you want to make a difference. Unless you're using Python 3.

Nick Johnson Over a year ago

Hm, true. Or use .sort instead of sorted, but that ruins the one-liner-ness. ;)

ninjagecko Over a year ago

turns out this is no better than sorting them out-of-place, because sorted will greedily allocate lots of memory, e.g. sorted(range(10**6),key=lambda x:x). (by range I mean xrange, it's been changed in python3) You will notice a significant chunk of your RAM disappear when you do this. Turns out that sorted is smart enough not to sort range though, so beware of testing without a key= function.

Jeffrey Harris · Accepted Answer · 2010-02-08 16:26:11Z

3

Is there an easy way? Yes. Use zip.

Is there an "easy way that doesn't use a zip variant"? No.

If you wanted to elaborate on why you object to using zip, that would be helpful. Either you're copying objects, in which case Python will copy by reference, or you're copying something so lightweight into a lightweight tuple as to not be worthy of optimization.

If you really don't care about execution speed but are especially concerned for some reason about memory pressure, you could roll your own bubble sort (or your sort algorithm of choice) on your key list which swaps both the key list and the target lists elements when it does a swap. I would call this the opposite of easy, but it would certainly limit your working set.

edited Feb 8, 2010 at 16:26

answered Feb 8, 2010 at 16:18

Jeffrey Harris

3,57029 silver badges30 bronze badges

4 Comments

Nick Johnson Over a year ago

Just because you can't think of an easy way that doesn't use zip doesn't mean there isn't one - see my answer. :)

Jeffrey Harris Over a year ago

Your answer is zipping by another name, so I stand behind "there is no easy way that doesn't use a zip variant". This was a silly question to begin with, though, so if sorting in memory what are essentially tuples of (sort_value, index) is preferable to sorting tuples of (sort_value, target_value), fine.

Nick Johnson Over a year ago

"Zipping by another name"? It's certainly not - it doesn't have anything to do with zipping, and doesn't modify the original elements at all. Indeed, it doesn't even touch the second array.

Jeffrey Harris Over a year ago

Zipping just means constructing an iterable of tuples from two or more source iterables. What do you think sorted(indices, key=foo) is doing? It's constructing a list of tuples of (foo_val, index), then sorting them. That is indistinguishable from zipping (foo_val, target_val) and sorting.

bayer · Accepted Answer · 2010-02-08 16:12:57Z

0

To achieve this, you would have to implement your own sort.

However: Does the unnecessary copying really hurt your application? Often parts of Python strike me as inefficient, too, but they are efficient enough for what I need.

answered Feb 8, 2010 at 16:12

bayer

6,89227 silver badges35 bronze badges

8 Comments

dsimcha Over a year ago

I see your point about avoiding premature optimization, but sometimes (this being one such case) I like to write generic code and know that if I ever use it on a huge dataset or something it will "just work". In this case I'm more worried about running out of memory than about speed.

SilentGhost Over a year ago

and wouldn't your own sort involve use of zip, dict, etc?

bayer Over a year ago

No. Say you implement your own quicksort - you can make sure to make any swaps on both lists.

Mike Graham Over a year ago

I would suspect any parallel sort implemented in Python for this would use a lot more time and a little more memory. The big-O speed and space cannot be improved, and you're adding a lot of overhead to do this in pure Python.

Mike Graham Over a year ago

@dimcha, If you have a large dataset and are running out of memory, the solution may be this, but it may be to use a numpy array, or an array.array, or a database, etc. Consider whether this is the solution that will really help.

|

Mike Graham · Accepted Answer · 2010-02-08 16:52:36Z

Any solution I can imagine short of introducing a sort from scratch uses indices, or a dict, or something else that really is not apt to save you memory. In any event, using zip will only increase memory usage by a constant factor, so it is worth making sure this is really a problem before a solution.

If it does get to be a problem, there may be more effective solutions. Since the elements of foo and bar are so closely related, are you sure their right representation is not a list of tuples? Are you sure they should not be in a more compact data structure if you are running out of memory, such as a numpy array or a database (the latter of which is really good at this kind of manipulation)?

(Also, incidentally, itertools.izip can save you a little bit of memory over zip, though you still end up with the full zipped list in list form as the result of sorted.)

Collectives™ on Stack Overflow

Python sort parallel arrays in place?

4 Answers 4

3 Comments

4 Comments

8 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

4 Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related