We need to split the last column by that underscore, lexsort it and then use those indices to sort the input array.
Thus, an implementation would be -
def numpy_app(arr):
# Extract out the strings on last column split based on '_'.
# Thus, for given sample we would have the last column would be
# split further into 3 columns, the middle one being of '_''s.
a = np.core.defchararray.partition(arr[:,-1],'_')
# Lexsort it on the last numeric cols (0,2). We need to flip
# the order of columns to give precedence to the first string
sidx = np.lexsort(a[:,2::-2].astype(int).T)
# Index into input array with lex-sorted indices for final o/p
return arr[sidx]
Based on the edits in the question, it seems we want to cut out the string after the underscore. To do so, here's a modified version -
def numpy_cut_app(arr):
a = np.core.defchararray.partition(arr[:,-1],'_')
sidx = np.lexsort(a[:,2::-2].astype(int).T)
out = arr[sidx]
# Replace the last column with the first string off the last column's split one
out[:,-1] = a[sidx,0]
return out
Based on more edits, it seems we want to include the fourth column into lex-sorting and neglect everything after the underscore in the last column. So, a further modified version would be -
def numpy_cut_col3_app(arr):
a = np.core.defchararray.partition(arr[:,-1],'_')
# Lex-sort using first off the split strings from last col(precedence to it)
# and col-3 of input array
sidx = np.lexsort([arr[:,3].astype(float), a[:,0]])
out = arr[sidx]
out[:,-1] = a[sidx,0]
return out
Sample runs -
In [567]: arr
Out[567]:
array([['65.0', '30.0', '20.0', '0.0', '9_49'],
['2.0', '29.0', '24.0', '0.0', '1_0'],
['0.0', '18.0', '4.0', '0.0', '2_0'],
['16.0', '9.0', '0.0', '9990.0', '7_203'],
['16.0', '9.0', '0.0', '9990.0', '9_5'],
['20.0', '23.0', '31.0', '9990.0', '8_158'],
['65.0', '30.0', '20.0', '0.0', '9_50']],
dtype='|S6')
In [568]: numpy_app(arr)
Out[568]:
array([['2.0', '29.0', '24.0', '0.0', '1_0'],
['0.0', '18.0', '4.0', '0.0', '2_0'],
['16.0', '9.0', '0.0', '9990.0', '7_203'],
['20.0', '23.0', '31.0', '9990.0', '8_158'],
['16.0', '9.0', '0.0', '9990.0', '9_5'],
['65.0', '30.0', '20.0', '0.0', '9_49'],
['65.0', '30.0', '20.0', '0.0', '9_50']],
dtype='|S6')
In [569]: numpy_cut_app(arr)
Out[569]:
array([['2.0', '29.0', '24.0', '0.0', '1'],
['0.0', '18.0', '4.0', '0.0', '2'],
['16.0', '9.0', '0.0', '9990.0', '7'],
['20.0', '23.0', '31.0', '9990.0', '8'],
['16.0', '9.0', '0.0', '9990.0', '9'],
['65.0', '30.0', '20.0', '0.0', '9'],
['65.0', '30.0', '20.0', '0.0', '9']],
dtype='|S6')
arr[np.random.randint(0,arr.shape[0],(1000))]to test out all approaches? You may vary that1000there.arrin random order with repeats and gets us a(1000,5)shaped array.