python argsort indices based on multiple arrays

Question

I am looking for a function ideally in pure python that is similar to numpy.argsort in that it returns only a list of sorted indices while leaving the original arrays untouched, yet it needs to be able to sort on data contained in multiple arrays.

Example:

>>> names = ['xavier', 'bob', 'billy', 'jene', 'samson']
>>> ages = [15, 32, 63, 32, 15]
>>>indexes = sort by ages and then by names
[4, 0, 1, 3, 2]
>>> for i in indexes:
>>>    print "Name", names[i]
>>>    print "Age", ages[i]

The sorting function cannot create extra data structures, meaning list comprehension or functions like zip are out of the question. Each array consists of 5 million objects, generating zipped version of the arrays explodes the memory requirements by a factor of at least 3. Using list comprehension such as sorted(..key=lambda x:(names[x],ages[x])) causes a slowdown such as the sort takes over a minute to complete (and the memory requirements to create these intermediary tuples)

So far, as long as I only want to sort on a single array it is fast enough, however since the indices list does not know about the other arrays, I am unable to call multiple "sort" operations as I would if I had a zipped version of the two lists.

"The sorting function cannot create extra data structures" Impossible to fulfill. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Apr 1, 2012 at 21:03
No intermediary structures like using zip or returning new tuples in a lambda on the key. The problem with those approaches is they duplicate the data. My issue currently is the data structures are around 600MB, if I have to create a zipped version of them, or create a tuple of each of the elements, that causes the memory usage to grow to about 2GB which is not realistic either. — Gladius
– Gladius, Commented Apr 1, 2012 at 21:19
"they duplicate the data" No they don't. They copy the references. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Apr 1, 2012 at 21:19
newlist = zip(ages, names) creates a brand new list, it it is at least making an exact copy of all of the data, not including the overhead of the managing this new list. — Gladius
– Gladius, Commented Apr 1, 2012 at 21:27
indexes.sort(key=lambda x:(ages[x],names[x])) takes minutes to sort. — Gladius
– Gladius, Commented Apr 1, 2012 at 21:27

mg. · Accepted Answer · 2012-04-01 22:33:07Z

3

This is the best I can think. Most ints in python are singleton so the new list created by the first sorted call should not create much more brand new objects. The second sorted call should create a smaller list, it depends on how much different the ages are.

>>> import itertools, operator
>>> names = ['xavier', 'bob', 'billy', 'jene', 'samson']
>>> ages = [15, 32, 63, 32, 15]
>>> itemgetter = operator.itemgetter(1)
>>> sortedAges = sorted(enumerate(ages), key=itemgetter)
>>> for k, group in itertools.groupby(sortedAges, itemgetter):
...     g = sorted([(i, names[i]) for i, _ in group], key=itemgetter)
...     for i, name in g:
...         print 'Name:', name, 'Age:', ages[i]
... 
Name: samson Age: 15
Name: xavier Age: 15
Name: bob Age: 32
Name: jene Age: 32
Name: billy Age: 63

edited Apr 1, 2012 at 22:33

answered Apr 1, 2012 at 22:17

mg.

8,0921 gold badge28 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Gladius · Accepted Answer · 2012-04-08 00:12:11Z

0

I created my own solution that works great.

Given the following data set:

groups = reversed(range(5000000))
ages = [random.randrange(0, 120) for x in groups]
names = ['foobar-%d' % random.randrange(0, 5000) for x in groups]

columns = dict(names=names,ages=ages,groups=groups)

def sort_on(col):
    idxs = range(len(columns[col]))
    idxs.sort(key=lambda x:columns[col][x])
    return idxs

answered Apr 8, 2012 at 0:12

Gladius

3691 gold badge3 silver badges14 bronze badges

Collectives™ on Stack Overflow

python argsort indices based on multiple arrays

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related