Sort array columns based upon sum

Question

Let's suppose I have an array as such:

np.array([1., 1., 0.],
       [0., 4., 0.],
       [8., 0., 8.],
       [0., 0., 0.],
       [5., 0., 0.],
       [2., 2., 2.]])

With column[0] summing to 16, column[1] to 6 and column[2] to 10.

How do I efficiently in Numpy re-arrange the array by column value greatest to least? In the above example, column[0] would remain in place and column[1] and column[2] would switch positions.

Also can try np.array(list(zip(*sorted(zip(*arr), key=sum,reverse=True)))) — Heaven
– Heaven, Commented Sep 7, 2018 at 6:46

Space Impact · Accepted Answer · 2018-09-07 06:38:21Z

7

You can try sum along axis=0 and use argsort then reverse the array and use:

a[:,np.argsort(a.sum(axis=0))[::-1]]

array([[1., 0., 1.],
       [0., 0., 4.],
       [8., 8., 0.],
       [0., 0., 0.],
       [5., 0., 0.],
       [2., 2., 2.]])

answered Sep 7, 2018 at 6:38

Space Impact

13.3k26 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Silpara · Accepted Answer · 2018-09-07 06:38:33Z

1

Using a combination of np.sum and np.argsort you can achieve this as follows:

x = np.array([[1., 1., 0.],[0., 4., 0.],[8., 0., 8.],[0., 0., 0.],[5., 0., 0.],[2., 2., 2.]])
x[:, np.argsort(-np.sum(x, 0))]
array([[ 1.,  0.,  1.],
       [ 0.,  0.,  4.],
       [ 8.,  8.,  0.],
       [ 0.,  0.,  0.],
       [ 5.,  0.,  0.],
       [ 2.,  2.,  2.]])

answered Sep 7, 2018 at 6:38

Silpara

6391 gold badge9 silver badges15 bronze badges

1 Comment

zvone Over a year ago

-np.sum(x, 0)) is simpler than flipping the indices in the end. nice.

zvone · Accepted Answer · 2018-09-07 06:45:03Z

1

Swapping the last two columns is done this way:

a = np.array([[1., 1., 0.],
             [0., 4., 0.],
             [8., 0., 8.],
             [0., 0., 0.],
             [5., 0., 0.],
             [2., 2., 2.]])

result = a[:, [0, 2, 1]]

So, what you need is to calculate those indexes [0, 2, 1] based on column sums.

This gets you the sums of all columns:

a.sum(axis=0)  # array([16.,  7., 10.])

and from that, you get the indices for sorting:

np.argsort(np.array([16.,  7., 10.]))   # [1, 2, 0]

You need to flip it to get the highest-to-lowest order:

np.flip([1, 2, 0])   # [0, 2, 1]

So, all together, it is:

result = a[:, np.flip(np.argsort(a.sum(axis=0)))]

answered Sep 7, 2018 at 6:45

zvone

19.5k5 gold badges53 silver badges85 bronze badges

Comments

Paolo Irrera · Accepted Answer · 2018-09-07 06:40:56Z

0

You can do something like this:

def main():
    a = np.array([[1., 1., 0.],
                 [0., 4., 0.],
                 [8., 0., 8.],
                 [0., 0., 0.],
                 [5., 0., 0.],
                 [2., 2., 2.]])
    col_sum = np.sum(a, axis=0)
    sort_index = np.argsort(-col_sum) # index sort in descending order
    out_matrix = a[:, sort_index]
    print(out_matrix)

I think that a new instance (out_matrix) is necessary because you can't really switch columns inplace.

answered Sep 7, 2018 at 6:40

Paolo Irrera

2531 silver badge10 bronze badges

Comments

Amadan · Accepted Answer · 2018-09-07 06:42:52Z

0

arr = np.array([[1., 1., 0.],
                [0., 4., 0.],
                [8., 0., 8.],
                [0., 0., 0.],
                [5., 0., 0.],
                [2., 2., 2.]])

perm = np.flip(np.argsort(np.sum(arr, axis=0)))
result = a[:, perm]

Get the sums; then get the permutation (array of indices) which sorts the sums. argsort sorts in ascending order, so reverse the permutation so we get indices from highest sum to lowest. Finally, reorder the original array by the same permutation.

edited Sep 7, 2018 at 6:42

answered Sep 7, 2018 at 6:37

Amadan

200k23 gold badges252 silver badges321 bronze badges

1 Comment

Amadan Over a year ago

@AkshayNevrekar: Right. Thanks.

U13-Forward · Accepted Answer · 2018-09-07 06:55:50Z

0

Or you can use pandas:

>>> import pandas as pd, numpy as np
>>> arr=np.array([[1., 1., 0.],
       [0., 4., 0.],
       [8., 0., 8.],
       [0., 0., 0.],
       [5., 0., 0.],
       [2., 2., 2.]])
>>> df=pd.DataFrame(arr)
>>> df.sort_index(axis=1).values
array([[ 1.,  1.,  0.],
       [ 0.,  4.,  0.],
       [ 8.,  0.,  8.],
       [ 0.,  0.,  0.],
       [ 5.,  0.,  0.],
       [ 2.,  2.,  2.]])
>>>

edited Sep 7, 2018 at 6:55

answered Sep 7, 2018 at 6:42

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

2 Comments

zvone Over a year ago

Here is a loop which is executed in python, rather than in numpy: sorted(df.columns.tolist(),key=lambda x: df[x].sum(),reverse=True). That is not efficient. The whole point of using numpy is not to make loops in python.

U13-Forward Over a year ago

@zvone Edited mine, much cleaner

Collectives™ on Stack Overflow

Sort array columns based upon sum

6 Answers 6

Comments

1 Comment

Comments

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related