I am looking for a function ideally in pure python that is similar to numpy.argsort in that it returns only a list of sorted indices while leaving the original arrays untouched, yet it needs to be able to sort on data contained in multiple arrays.
Example:
>>> names = ['xavier', 'bob', 'billy', 'jene', 'samson']
>>> ages = [15, 32, 63, 32, 15]
>>>indexes = sort by ages and then by names
[4, 0, 1, 3, 2]
>>> for i in indexes:
>>> print "Name", names[i]
>>> print "Age", ages[i]
The sorting function cannot create extra data structures, meaning list comprehension or functions like zip are out of the question. Each array consists of 5 million objects, generating zipped version of the arrays explodes the memory requirements by a factor of at least 3. Using list comprehension such as sorted(..key=lambda x:(names[x],ages[x])) causes a slowdown such as the sort takes over a minute to complete (and the memory requirements to create these intermediary tuples)
So far, as long as I only want to sort on a single array it is fast enough, however since the indices list does not know about the other arrays, I am unable to call multiple "sort" operations as I would if I had a zipped version of the two lists.
newlist = zip(ages, names)creates a brand new list, it it is at least making an exact copy of all of the data, not including the overhead of the managing this new list.indexes.sort(key=lambda x:(ages[x],names[x]))takes minutes to sort.