6

Is there an easy (meaning without rolling one's own sorting function) way to sort parallel lists without unnecessary copying in Python? For example:

foo = range(5)
bar = range(5, 0, -1)
parallelSort(bar, foo)
print foo # [4,3,2,1,0]
print bar # [1,2,3,4,5]

I've seen the examples using zip but it seems silly to copy all your data from parallel lists to a list of tuples and back again if this can be easily avoided.

4
  • 1
    What do you think this parallelSort would do? It appears from your comments that it sort foo in decreasing order and bar in increasing order -- is that right? Commented Feb 8, 2010 at 15:55
  • @Paul: It sorts bar and manipulates foo in lockstep. Commented Feb 8, 2010 at 15:58
  • What will parallelSort gives if initially foo is [2,4,6,10,8] and bar is [3,7,9,5,1]? Commented Feb 8, 2010 at 15:58
  • @Kenny: bar is now sorted and equals [1,3,5,7,9]. foo is manipulated in lockstep and now equals [8,2,10,4,6]. Commented Feb 8, 2010 at 15:59

4 Answers 4

6

Here's an easy way:

perm = sorted(xrange(len(foo)), key=lambda x:foo[x])

This generates a list of permutations - the value in perm[i] is the index of the ith smallest value in foo. Then, you can access both lists in order:

for p in perm:
  print "%s: %s" % (foo[p], bar[p])

You'd need to benchmark it to find out if it's any more efficient, though - I doubt it makes much of a difference.

Sign up to request clarification or add additional context in comments.

3 Comments

Change range to xrange if you want to make a difference. Unless you're using Python 3.
Hm, true. Or use .sort instead of sorted, but that ruins the one-liner-ness. ;)
turns out this is no better than sorting them out-of-place, because sorted will greedily allocate lots of memory, e.g. sorted(range(10**6),key=lambda x:x). (by range I mean xrange, it's been changed in python3) You will notice a significant chunk of your RAM disappear when you do this. Turns out that sorted is smart enough not to sort range though, so beware of testing without a key= function.
3

Is there an easy way? Yes. Use zip.

Is there an "easy way that doesn't use a zip variant"? No.

If you wanted to elaborate on why you object to using zip, that would be helpful. Either you're copying objects, in which case Python will copy by reference, or you're copying something so lightweight into a lightweight tuple as to not be worthy of optimization.

If you really don't care about execution speed but are especially concerned for some reason about memory pressure, you could roll your own bubble sort (or your sort algorithm of choice) on your key list which swaps both the key list and the target lists elements when it does a swap. I would call this the opposite of easy, but it would certainly limit your working set.

4 Comments

Just because you can't think of an easy way that doesn't use zip doesn't mean there isn't one - see my answer. :)
Your answer is zipping by another name, so I stand behind "there is no easy way that doesn't use a zip variant". This was a silly question to begin with, though, so if sorting in memory what are essentially tuples of (sort_value, index) is preferable to sorting tuples of (sort_value, target_value), fine.
"Zipping by another name"? It's certainly not - it doesn't have anything to do with zipping, and doesn't modify the original elements at all. Indeed, it doesn't even touch the second array.
Zipping just means constructing an iterable of tuples from two or more source iterables. What do you think sorted(indices, key=foo) is doing? It's constructing a list of tuples of (foo_val, index), then sorting them. That is indistinguishable from zipping (foo_val, target_val) and sorting.
0

To achieve this, you would have to implement your own sort.

However: Does the unnecessary copying really hurt your application? Often parts of Python strike me as inefficient, too, but they are efficient enough for what I need.

8 Comments

I see your point about avoiding premature optimization, but sometimes (this being one such case) I like to write generic code and know that if I ever use it on a huge dataset or something it will "just work". In this case I'm more worried about running out of memory than about speed.
and wouldn't your own sort involve use of zip, dict, etc?
No. Say you implement your own quicksort - you can make sure to make any swaps on both lists.
I would suspect any parallel sort implemented in Python for this would use a lot more time and a little more memory. The big-O speed and space cannot be improved, and you're adding a lot of overhead to do this in pure Python.
@dimcha, If you have a large dataset and are running out of memory, the solution may be this, but it may be to use a numpy array, or an array.array, or a database, etc. Consider whether this is the solution that will really help.
|
0

Any solution I can imagine short of introducing a sort from scratch uses indices, or a dict, or something else that really is not apt to save you memory. In any event, using zip will only increase memory usage by a constant factor, so it is worth making sure this is really a problem before a solution.

If it does get to be a problem, there may be more effective solutions. Since the elements of foo and bar are so closely related, are you sure their right representation is not a list of tuples? Are you sure they should not be in a more compact data structure if you are running out of memory, such as a numpy array or a database (the latter of which is really good at this kind of manipulation)?

(Also, incidentally, itertools.izip can save you a little bit of memory over zip, though you still end up with the full zipped list in list form as the result of sorted.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.