3

Let's say I have this dataframe.

df = pd.DataFrame([['A-store',5,'B-store',4,'C-store',6], \
                   ['B-store',3,'P-store',4,np.nan,np.nan], \
                   ['N-store',20,np.nan,np.nan,'I-store',9], \
                   ['L-store',8,'N-store',2,'A-store',5]],
           columns=['store_1','time_1','store_2','time_2','store_3','time_3'])
   store_1  time_1  store_2  time_2  store_3  time_3
0  A-store       5  B-store     4.0  C-store     6.0
1  B-store       3  P-store     4.0      NaN     NaN
2  N-store      20      NaN     NaN  I-store     9.0
3  L-store       8  N-store     2.0  A-store     5.0

Ex: To get to the A-store it takes 5 minutes.

How can I sort the set of values (store, time) so that the leftmost set becomes the shortest and right most becomes the longest. I need to sort set of values over multiple columns. Also, it includes NaN.

Here is ideal output.

shorter <----------------------------------->  longer
   store_1  time_1  store_2  time_2  store_3  time_3
0  B-store     4.0  A-store       5  C-store     6.0
1  B-store       3  P-store     4.0      NaN     NaN
2  I-store     9.0  N-store      20      NaN     NaN
3  N-store     2.0  A-store     5.0  L-store       8

I could probably pivot or stack, and sort by rows. But, I'm not sure how to do this.

If anyone have any good ideas or codes, let me know.

Thanks!

2 Answers 2

3

Idea is reshape values with Series.str.split and DataFrame.stack, then sorting per first level and time column, create new order by GroupBy.cumcount and last reshape back to original:

df.columns = df.columns.str.split('_', expand=True)

df1=df.stack().reset_index(level=1,drop=True).rename_axis('lvl1').sort_values(['lvl1','time'])
df1 = df1.set_index(df1.groupby(level=0).cumcount().add(1), append=True)

df1 = df1.unstack().sort_index(axis=1, level=1).rename_axis(None)
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
print (df1)
   store_1  time_1  store_2  time_2  store_3  time_3
0  B-store     4.0  A-store     5.0  C-store     6.0
1  B-store     3.0  P-store     4.0      NaN     NaN
2  I-store     9.0  N-store    20.0      NaN     NaN
3  N-store     2.0  A-store     5.0  L-store     8.0
Sign up to request clarification or add additional context in comments.

Comments

1

This maybe a longer way of doing it. Maybe someone could give you a better approach. But this gives the output that you need.

import pandas as pd
import numpy as np
import operator

def func(lst):
    d = ({lst[i]: lst[i + 1] for i in range(0, len(lst), 2)})
    d = sorted(d.items(), key=operator.itemgetter(1))
    return [val for sublist in d for val in sublist]

df = pd.DataFrame([['A-store',5,'B-store',4,'C-store',6], \
                   ['B-store',3,'P-store',4,np.nan,np.nan], \
                   ['N-store',20,np.nan,np.nan,'I-store',9], \
                   ['L-store',8,'N-store',2,'A-store',5]],
           columns=['store_1','time_1','store_2','time_2','store_3','time_3'])

pd.DataFrame.from_records(df.apply(lambda x : func(x),axis=1) columns=['store_1','time_1','store_2','time_2','store_3','time_3'],

)

This would return the below as output.

    store_1 time_1  store_2 time_2  store_3 time_3
0   B-store 4.0     A-store 5.0     C-store 6.0
1   B-store 3.0     P-store 4.0     NaN     NaN
2   N-store 20.0    NaN     NaN     I-store 9.0
3   N-store 2.0     A-store 5.0     L-store 8.0

1 Comment

Thanks for the answer. I'll take a look at yours as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.