Sort child rows of parent rows in python

Question

I have a dataframe (the sample looks like this)

Type          SKU      Description   FullDescription        Size      Price
Variable       2        Boots          Shoes on sale       XL,S,M       
Variation      2.5      Boots XL                             XL       330
Variation      2.6      Boots S                              S        330
Variation      2.7      Boots M                              M        330
Variable       3        Helmet           Helmet Sizes      E42,E41
Variation      3.8      Helmet E42                          E42       89
Variation      3.2      Helmet E41                          E41       89

What I want to do is sort the values based on Size so the final data frame should look like this:

  Type          SKU      Description   FullDescription        Size      Price
    Variable       2        Boots          Shoes on sale       S,M,XL        
    Variation      2.6      Boots S                             S       330
    Variation      2.7      Boots M                             M        330
    Variation      2.5      Boots XL                            XL        330
    Variable       3        Boots           Helmet Sizes       E41,E42
    Variation      3.2      Helmet E41                          E41       89
    Variation      3.8      Helmet E42                          E42       89

I can just use sort_values() but I can't seem to find anything to retain the order of Type and SKU.

out = df.groupby(df.Type.eq('Variable').cumsum()).\
       apply(lambda x : pd.concat([x.iloc[[0]].assign(Size=lambda y : y['Size'].str.split(',').str[::-1].str.join(',')),
                        x.iloc[1:,].iloc[::-1]]))

I have tried this code but it's printing variations before variables and that too in reverse order (on the large dataset). Please note 'Size' has different variations and not just limited to 'XL,M,S' and 'E42,E41' it also has values like 5XXL, 39mm etc. Any help would be appreciated

Any help would be appreciated.

Edit:

grp=(df.groupby('Type')).cumcount()



       Type SKU Description FullDescription Size    Price
0   variable    2.0 Boots   Shoes on sale   S,M,XL  NaN
2   variation   2.6 Boots S           NaN   S       330.0
4   variable    3.0 Helmet  Helmet Sizes    E41,E42 NaN
3   variation   2.7 Boots M           NaN   M       330.0
1   variation   2.5 Boots XL          NaN   XL      330.0
6   variation   3.2 Helmet E41        NaN   E41     123.0
5   variation   3.8 Helmet E42        NaN   E42     112.0

David Erickson · Accepted Answer · 2021-01-07 01:48:02Z

1

You can temporarily replace the values to something that you can sort by, and at the end change them back by replacing sizes with dig. dig values are arbitrary, but sure everything is in order.
Temporarily crate a grp column to sort
lambda x function to join the strings with commas to a list, sort and then convert back to string
Replace the dig values back to sizes

sizes, dig = ['S','M','XL','L',], ['000','111','333','222'] #make sure dig values do not exist as a substring anywhere in your dataframe
df = (df.assign(Size=df['Size'].replace(sizes, dig, regex=True))
        .assign(grp=(df['Type'] == 'Variable').cumsum()) 
        .sort_values(['grp', 'Type', 'Size']).drop('grp', axis=1))
df['Size'] = df['Size'].apply(lambda x: ','.join(sorted(x.split(',')))).replace(dig, sizes, regex=True)
df
Out[1]: 
        Type  SKU Description FullDescription     Size  Price
0   Variable  2.0       Boots   Shoes on sale   S,M,XL    NaN
2  Variation  2.6     Boots S             NaN        S  330.0
3  Variation  2.7     Boots M             NaN        M  330.0
1  Variation  2.5    Boots XL             NaN       XL  330.0
4   Variable  3.0      Helmet    Helmet Sizes  E41,E42    NaN
6  Variation  3.2  Helmet E41             NaN      E41   89.0
5  Variation  3.8  Helmet E42             NaN      E42   89.0

edited Jan 7, 2021 at 1:48

answered Jan 7, 2021 at 1:01

David Erickson

16.7k2 gold badges21 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

hyeri Over a year ago

Hi, Thanks for the answer. I think it does work for the smaller set but for the original dataset, it is separating variables from variations. It first shows the list of variables and then that of variations.

hyeri Over a year ago

not working, unfortunately, I have added the output in the original post above.

David Erickson Over a year ago

@hyeri thanks for that. Unfortunately, that is a significantly different problem to solve and significant iterative changes to the question are discouraged on StackOverflow. My solution works on the dataset you have provided. If you wouldn't mind kindly accepting my solution and creating a follow-up question with this separate problem, it would be greatly appreciated. I have to run soon anyway, and it should only take 2 minutes to copy paste to a new question and reference back to this one. Thank you!

hyeri Over a year ago

It does work on the data I have provided, Thanks for your time and help :)

hyeri Over a year ago

Hi David, your solution works for original dataset as well, however I have tried to add more sizes and couldn't get the required results, I was hoping if you could take a look as it is your code here : stackoverflow.com/questions/65616914/…

Collectives™ on Stack Overflow

Sort child rows of parent rows in python

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related