1

Is there a way to sort the rows of an array by the last element, in this case the cell ids. The cell id is build as follows : "CellID_NumberOfCell

arr =np.array([['65.0','30.0','20.0','0.0','0_0'],
 ['2.0','29.0','24.0','0.0','1_0'],
 ['0.0','18.0','4.0','0.0','2_0'],
 ['16.0','9.0','0.0','9990.0','7_203'],
 ['16.0','9.0','0.0','9990.0','0_203'],
 ['20.0','23.0','31.0','9990.0','8_158'],
 ['65.0','30.0','20.0','0.0','0_10']])

So after sorting it should look like:

arr =np.array([['65.0','30.0','20.0','0.0','0_0'],
 ['65.0','30.0','20.0','0.0','0_10'],
 ['16.0','9.0','0.0','9990.0','0_203'],
 ['2.0','29.0','24.0','0.0','1_0'],
 ['0.0','18.0','4.0','0.0','2_0'],
 ['16.0','9.0','0.0','9990.0','7_203'],
 ['20.0','23.0','31.0','9990.0','8_158']])

EDIT:

Is it also possible to delete the numbers after the underscore after sorting?. So that i just have the ID. Instead of 0_0 just 0.

EDIT2

After sorting the ID, it should also sort after time, so that every ID with 0 for example should also be sorted after time 0,1...9999 etc.

7
  • 1
    Edit the question title to reflect it's intention; something like, "How to sort a NumPy array by the last element of each row?" Commented Jun 7, 2017 at 11:07
  • How can I check which answer has the best runtime? :) Commented Jun 7, 2017 at 11:34
  • @Varlor Use an input with : arr[np.random.randint(0,arr.shape[0],(1000))] to test out all approaches? You may vary that 1000 there. Commented Jun 7, 2017 at 11:39
  • @Divakar Hey, how does this function work? Does it analyze the structure of my input and generates 1000 randomes in the shape of it? Commented Jun 7, 2017 at 11:51
  • 1
    @Varlor Basically gets those rows off arr in random order with repeats and gets us a (1000,5) shaped array. Commented Jun 7, 2017 at 11:52

3 Answers 3

5

np.argsort(arr[:, -1]) will give you the permutation so that elements of the last column of arr are ordered.

Then, arr[np.argsort(arr[:, -1])] reorders the rows of arr according to this permutation.

Beware that the lexicographic order is used since your data consists of string, so 0_10 comes before 0_2. If this is not what you want, you should split the last column, and I advise you to use a pandas.DataFrame:

import pandas as pd
df = pd.DataFrame(arr)
df['Cell'], df['CellIndex'] = df[df.columns[-1]].str.split('_', 1).str
df['Cell'] = df['Cell'].astype(int)
df['CellIndex'] = df['CellIndex'].astype(int)
df.sort_values(['Cell', 'CellIndex'])

pandas is really the way to go to manipulate this kind of data.

Sign up to request clarification or add additional context in comments.

4 Comments

Is arr really a numpy array ? What is type(arr) ? Try arr = np.array(arr)
yeah, was a mistake of me. But the next problem is that the output now is pandas data frame. is it possible to cast it back to numpy array? :)
I added an edit. Is it also possible to do that after sorting?
@Varlor arr = np.array(df). Pandas relies heavily on numpy :)
2

We need to split the last column by that underscore, lexsort it and then use those indices to sort the input array.

Thus, an implementation would be -

def numpy_app(arr):
    # Extract out the strings on last column split based on '_'.
    # Thus, for given sample we would have the last column would be
    # split further into 3 columns, the middle one being of '_''s.
    a = np.core.defchararray.partition(arr[:,-1],'_')

    # Lexsort it on the last numeric cols (0,2). We need to flip
    # the order of columns to give precedence to the first string
    sidx = np.lexsort(a[:,2::-2].astype(int).T)

    # Index into input array with lex-sorted indices for final o/p
    return arr[sidx]

Based on the edits in the question, it seems we want to cut out the string after the underscore. To do so, here's a modified version -

def numpy_cut_app(arr):
    a = np.core.defchararray.partition(arr[:,-1],'_')
    sidx = np.lexsort(a[:,2::-2].astype(int).T)
    out = arr[sidx]

    # Replace the last column with the first string off the last column's split one
    out[:,-1] = a[sidx,0]
    return out

Based on more edits, it seems we want to include the fourth column into lex-sorting and neglect everything after the underscore in the last column. So, a further modified version would be -

def numpy_cut_col3_app(arr):
    a = np.core.defchararray.partition(arr[:,-1],'_')

    # Lex-sort using first off the split strings from last col(precedence to it)
    # and col-3 of input array
    sidx = np.lexsort([arr[:,3].astype(float), a[:,0]])
    out = arr[sidx]
    out[:,-1] = a[sidx,0]
    return out

Sample runs -

In [567]: arr
Out[567]: 
array([['65.0', '30.0', '20.0', '0.0', '9_49'],
       ['2.0', '29.0', '24.0', '0.0', '1_0'],
       ['0.0', '18.0', '4.0', '0.0', '2_0'],
       ['16.0', '9.0', '0.0', '9990.0', '7_203'],
       ['16.0', '9.0', '0.0', '9990.0', '9_5'],
       ['20.0', '23.0', '31.0', '9990.0', '8_158'],
       ['65.0', '30.0', '20.0', '0.0', '9_50']], 
      dtype='|S6')

In [568]: numpy_app(arr)
Out[568]: 
array([['2.0', '29.0', '24.0', '0.0', '1_0'],
       ['0.0', '18.0', '4.0', '0.0', '2_0'],
       ['16.0', '9.0', '0.0', '9990.0', '7_203'],
       ['20.0', '23.0', '31.0', '9990.0', '8_158'],
       ['16.0', '9.0', '0.0', '9990.0', '9_5'],
       ['65.0', '30.0', '20.0', '0.0', '9_49'],
       ['65.0', '30.0', '20.0', '0.0', '9_50']], 
      dtype='|S6')

In [569]: numpy_cut_app(arr)
Out[569]: 
array([['2.0', '29.0', '24.0', '0.0', '1'],
       ['0.0', '18.0', '4.0', '0.0', '2'],
       ['16.0', '9.0', '0.0', '9990.0', '7'],
       ['20.0', '23.0', '31.0', '9990.0', '8'],
       ['16.0', '9.0', '0.0', '9990.0', '9'],
       ['65.0', '30.0', '20.0', '0.0', '9'],
       ['65.0', '30.0', '20.0', '0.0', '9']], 
      dtype='|S6')

11 Comments

Nice! The problem here now is sth like here: ['10.0' '33.0' '14.0' '2505.0' '9_49'] ['1.0' '12.0' '15.0' '180.0' '9_5'] ['12.0' '3.0' '15.0' '2520.0' '9_50']. 5 is sorted between 49 and 50.
I added an edit. Is it also possible to do that after sorting?
@Varlor Updated. Fixed that 5, 49, 50 sorting issue.
Ok thank you very much!!! Unfortunately i made a mistake in my question. It should be sorted by the ID like you did it, but also after the time(column 3). So the output of your test should be: array([['2.0', '29.0', '24.0', '0.0', '1'], ['0.0', '18.0', '4.0', '0.0', '2'], ['16.0', '9.0', '0.0', '9990.0', '7'], ['20.0', '23.0', '31.0', '9990.0', '8'], ['65.0', '30.0', '20.0', '0.0', '9'], ['65.0', '30.0', '20.0', '0.0', '9']], ['16.0', '9.0', '0.0', '9990.0', '9'], dtype='|S6')
@Varlor By third col, do you mean arr[:,3] or arr[:,2]?
|
2

You can do it easely with sorted and lambda function and as suggested by @Divakar to get the numpy array back:

np.array(sorted(arr, key=lambda x :x[-1]))

output

[['65.0', '30.0', '20.0', '0.0', '0_0'],
['65.0', '30.0', '20.0', '0.0', '0_10'],
['16.0', '9.0', '0.0', '9990.0', '0_203'],
['2.0', '29.0', '24.0', '0.0', '1_0'],
['0.0', '18.0', '4.0', '0.0', '2_0'],
['16.0', '9.0', '0.0', '9990.0', '7_203'],
['20.0', '23.0', '31.0', '9990.0', '8_158']]

EDIT : you can do it by using this, not pretty, but does the work

np.array([ np.append(i[:-1],i[-1].split("_")[0]) for i in sorted(list(arr), key=lambda x :x[-1])])

ouput

array([['65.0', '30.0', '20.0', '0.0', '0'],
       ['65.0', '30.0', '20.0', '0.0', '0'],
       ['16.0', '9.0', '0.0', '9990.0', '0'],
       ['2.0', '29.0', '24.0', '0.0', '1'],
       ['0.0', '18.0', '4.0', '0.0', '2'],
       ['16.0', '9.0', '0.0', '9990.0', '7'],
       ['20.0', '23.0', '31.0', '9990.0', '8']], 
      dtype='<U6')

5 Comments

If i use your approach i got something like this (arrays in array): [array(['65.0', '30.0', '20.0', '0.0', '0_0'], dtype='|S6'), array(['65.0', '30.0', '20.0', '0.0', '0_10'], dtype='|S6'), array(['16.0', '9.0', '0.0', '9990.0', '0_203'], dtype='|S6'), array(['2.0', '29.0', '24.0', '0.0', '1_0'], dtype='|S6'), array(['0.0', '18.0', '4.0', '0.0', '2_0'], dtype='|S6'), array(['16.0', '9.0', '0.0', '9990.0', '7_203'], dtype='|S6'), array(['20.0', '23.0', '31.0', '9990.0', '8_158'], dtype='|S6')]
@valor wasn't that the original format ? If not can you provide a way for people to reproduce your data ? if i enter your line arr =..., i get TypeError: list indices must be integers or slices, not str, so i assumed it was a nested list
@Varlor Use np.array() to get back an array output.
I added an edit. Is it also possible to do that after sorting?
@Varlor Good. : ) If everything is working, can you accept one of the asnwer that fit best your need ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.