Sorting zipped list of scalars and numpy arrays: not handling duplicates

Question

I've been using this structure to sort vectors (the arrays) by some property of the vector. This structure (sorting vectors by a zipping them with scalars,and sorting by the scalars) has been working in other parts of my code, but in this case it fails with the warning: The truth value of an array with more than one element is ambiguous. This depends on there being duplicate values in the scalars (see below)

from numpy import array

pnts =[array([ 0.        ,  0.45402743, -0.64209154]), 
       array([-0.27803373,  0.45402743, -0.64209154]), 
       array([-0.64874546,  0.45402743,  0.        ]), 
       array([-0.27803373,  0.45402743,  0.64209154]), 
       array([ 0.        ,  0.45402743,  0.64209154]), 
       array([ 0.        , -0.45402743,  0.64209154]), 
       array([-0.27803373, -0.45402743,  0.64209154]), 
       array([-0.64874546, -0.45402743,  0.        ]), 
       array([-0.27803373, -0.45402743, -0.64209154]), 
       array([ 0.        , -0.45402743, -0.64209154]), 
       array([-0.46338972,  0.        ,  0.64209154]), 
       array([-0.46338972,  0.        , -0.64209154]), 
       array([-0.83410135,  0.        ,  0.        ])]

ds = [0.64209154071986396, 0.69970301064027385, 0.64874545642786008, 
        0.69970301064027385, 0.64209154071986396, 0.64209154071986396, 
        0.69970301064027385, 0.64874545642785986, 0.69970301064027385, 
        0.64209154071986396, 0.79184062463701899, 0.79184062463701899, 
        0.83410134835400274]

pnts = [pnt for (d,pnt) in sorted(zip(ds,pnts))] #sort by distances ds
print pnts

However if I shorten it to the first 3 points, it does work:

from numpy import array

pnts =[array([ 0.        ,  0.45402743, -0.64209154]), 
   array([-0.27803373,  0.45402743, -0.64209154]), 
   array([-0.64874546,  0.45402743,  0.        ])]

ds = [0.64209154071986396, 0.69970301064027385, 0.64874545642786008]

pnts = [pnt for (d,pnt) in sorted(zip(ds,pnts))]
print pnts

>[array([ 0.        ,  0.45402743, -0.64209154]), array([-0.64874546,  0.45402743,  0.        ]), array([-0.27803373,  0.45402743, -0.64209154])]

I'm sure the issue is because there are duplicates among the ds. When I go from 3 to 4 points where the first duplicate appears, it fails again. But other sorting routines in python work fine when there are duplicates. Why not this one?

In general this ValueError is produced when you try to compare 2 arrays and use the result as a scalar boolean. The classic case is if A==B:. Here it's because the sort is trying to compare two arrays from the pts list. — hpaulj
– hpaulj, Commented Feb 15, 2017 at 19:53

user2357112 · Accepted Answer · 2017-02-15 18:19:19Z

1

You're not sorting pnts by the ds values. You're sorting the elements of zip(ds, pnts). Those are tuples, which are ordered lexicographically; if you compare (x, y) to (x, z), the comparison will find the first elements equal and move on to comparing y and z. Since the second elements of your tuples are NumPy arrays, which don't have an ordering relation*, the sort fails.

If you want to sort by the ds values, specify a sort key that picks out those values:

sorted(zip(ds, pnts), key=lambda x: x[0])

or

import operator
sorted(zip(ds, pnts), key=operator.itemgetter(0))

*specifically, if you compare two NumPy arrays with an operator like <, instead of telling you if the first array is somehow "less than" the other, it gives you an array of broadcasted elementwise comparison results.

answered Feb 15, 2017 at 18:19

user2357112

286k32 gold badges490 silver badges571 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hess8 Over a year ago

Your solution worked!. And thanks for the explanation.

hess8 Over a year ago

It's interesting that the errors didn't show up in the other part of my code where there were "duplicates" because numerical noise made them slightly different in the last digits.

Paul Panzer · Accepted Answer · 2017-02-15 20:09:14Z

1

Tuples in Python are compared lexicographically. This comparison short circuits, i.e. if the first two elements are different the others are skipped because they can't reverse the order. This is why you do not see this error when there are no duplicates.

one solution would be using np.argsort:

order = np.argsort(ds)
pnts_sorted = np.array(pnts)[order, :]

This avoids the zipping and returns your sorted points as a 2d array, which for many uses is the more convenient structure. If you still want a list of arrays: list(pnts_sorted) will give you one.

np.argsort performs an indirect sort, instead of moving the elements of its argument it writes down how they should be moved to get them sorted. This "shuffle recipe" (just an array of integers each indicating which element of the to be sorted array would have to go in its position) can be applied to other arrays if they have the same number of elements along the sort axis. In the code snippet we convert pnts to a 2d array (because order does not work for indexing into lists) and then use order to sort rows according to ds. (The colon in the index tells numpy to apply the shuffle to entire rows.)

Finally, if I may, a piece of general advice. Unless there are compelling reasons not to it is typically advisable to keep this sort of data (both, ds and pnts) in arrays, not lists. For example, sorting an array will typically be much faster than sorting a list (unless you sort the list using np.sort, but that is only because np.sort returns an array even if you feed it a list).

edited Feb 15, 2017 at 20:09

answered Feb 15, 2017 at 18:27

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

2 Comments

hess8 Over a year ago

Thanks Paul. That's a good way to sort...I'll try that. The lists are so nice because it's easy to add to them rather than to np.arrays.

Paul Panzer Over a year ago

Yes, if your data size changes often and unpredictably that's a reason to use lists. If it's just a few times, np.concatenate may still be the better choice. Of course, if performance is not a consideration then lists may be more convenient in some situations.

Collectives™ on Stack Overflow

Sorting zipped list of scalars and numpy arrays: not handling duplicates

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related