2

I'm looking for a method to sort a dataframe by a column which consists of arrays. Below is my dataframe with index, arrays(a) and values (b).

index    a   b
0       [0]  0.014066
1       [1]  0.569054
2       [2]  0.379795
3       [3]  0.037084
4       [4]  0.699488
5       [5]  0.191816
6       [6]  0.107417
7    [0, 4]  0.008951
8    [0, 5]  0.002558
9    [0, 6]  0.002558
10   [1, 4]  0.448849
11   [1, 5]  0.089514
12   [1, 6]  0.030691
13   [2, 4]  0.217391
14   [2, 5]  0.095908
15   [2, 6]  0.066496
16   [3, 4]  0.024297
17   [3, 5]  0.003836
18   [3, 6]  0.007673
19   [0, 3]  0.000000
20   [1, 3]  0.000000
21   [2, 3]  0.000000

As seen the last 3 arrays are not sorted like the others. What I would like would be this:

index    a   b
0       [0]  0.014066
1       [1]  0.569054
2       [2]  0.379795
3       [3]  0.037084
4       [4]  0.699488
5       [5]  0.191816
6       [6]  0.107417
-> [0,3] here
7    [0, 4]  0.008951
8    [0, 5]  0.002558
9    [0, 6]  0.002558
-> [1,3] here
10   [1, 4]  0.448849
11   [1, 5]  0.089514
12   [1, 6]  0.030691
-> [2,3] here
13   [2, 4]  0.217391
14   [2, 5]  0.095908
15   [2, 6]  0.066496
16   [3, 4]  0.024297
17   [3, 5]  0.003836
18   [3, 6]  0.007673

Hope that makes sense. Thanks in advance! df.sort_values('a') doesn't seem to work. Only on the values in b.

1
  • 1
    Btw, providing a method to easily recreate the data would be appreciated. Those lists make it annoying. Commented May 18, 2018 at 14:57

3 Answers 3

3

Data from jpp

from natsort import natsorted
natsorted(s)
Out[940]: [[0], [0, 3], [0, 4], [1], [2], [3, 6]]

Update

s.iloc[natsorted(range(len(s)), key=lambda k: (len(s[k]),s[k]))]
Out[997]: 
0       [0]
1       [1]
2       [2]
5    [0, 3]
3    [0, 4]
4    [3, 6]
dtype: object
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the response. By the looks of the output, this puts [0, 3] before [1] etc. which wasn't exactly what I was looking for, or am I getting it wrong? But I'll definitely remember natsort.
2

Credit to @jpp for getting me straight on using len

use loc with sorted and key argument

m = {k: (len(v), tuple(v)) for k, v in df.a.items()}
df.loc[sorted(df.index, key=m.get)]

            a         b
index                  
0         [0]  0.014066
1         [1]  0.569054
2         [2]  0.379795
3         [3]  0.037084
4         [4]  0.699488
5         [5]  0.191816
6         [6]  0.107417
19     [0, 3]  0.000000
7      [0, 4]  0.008951
8      [0, 5]  0.002558
9      [0, 6]  0.002558
20     [1, 3]  0.000000
10     [1, 4]  0.448849
11     [1, 5]  0.089514
12     [1, 6]  0.030691
21     [2, 3]  0.000000
13     [2, 4]  0.217391
14     [2, 5]  0.095908
15     [2, 6]  0.066496
16     [3, 4]  0.024297
17     [3, 5]  0.003836
18     [3, 6]  0.007673

Old Answer

df.loc[sorted(df.index, key=lambda i: (lambda t: (len(t), tuple(t)))(df.at[i, 'a']))]

            a         b
index                  
0         [0]  0.014066
1         [1]  0.569054
2         [2]  0.379795
3         [3]  0.037084
4         [4]  0.699488
5         [5]  0.191816
6         [6]  0.107417
19     [0, 3]  0.000000
7      [0, 4]  0.008951
8      [0, 5]  0.002558
9      [0, 6]  0.002558
20     [1, 3]  0.000000
10     [1, 4]  0.448849
11     [1, 5]  0.089514
12     [1, 6]  0.030691
21     [2, 3]  0.000000
13     [2, 4]  0.217391
14     [2, 5]  0.095908
15     [2, 6]  0.066496
16     [3, 4]  0.024297
17     [3, 5]  0.003836
18     [3, 6]  0.007673

1 Comment

Thank you very much for the response. Your solution worked perfectly.
2

Looks like you need to sort by length of list, then by the list itself.

You can do this with numpy.lexsort. Here's a minimal example.

import numpy as np

s = pd.Series([[0], [1], [2], [0, 4], [3, 6], [0, 3]])

res = np.lexsort((s, s.str.len()))

# array([0, 1, 2, 5, 3, 4], dtype=int64)

So you can do this with your dataframe:

df = df.iloc[np.lexsort((df['a'], df['a'].str.len()))]

Just be careful, np.lexsort syntax works from right to left, i.e. sorting is performing first by length with the above logic.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.