1

I have a dataframe (the sample looks like this)

Type          SKU      Description   FullDescription        Size      Price
Variable       2        Boots          Shoes on sale       XL,S,M       
Variation      2.5      Boots XL                             XL       330
Variation      2.6      Boots S                              S        330
Variation      2.7      Boots M                              M        330
Variable       3        Helmet           Helmet Sizes      E42,E41
Variation      3.8      Helmet E42                          E42       89
Variation      3.2      Helmet E41                          E41       89

What I want to do is sort the values based on Size so the final data frame should look like this:

  Type          SKU      Description   FullDescription        Size      Price
    Variable       2        Boots          Shoes on sale       S,M,XL        
    Variation      2.6      Boots S                             S       330
    Variation      2.7      Boots M                             M        330
    Variation      2.5      Boots XL                            XL        330
    Variable       3        Boots           Helmet Sizes       E41,E42
    Variation      3.2      Helmet E41                          E41       89
    Variation      3.8      Helmet E42                          E42       89

I can just use sort_values() but I can't seem to find anything to retain the order of Type and SKU.

out = df.groupby(df.Type.eq('Variable').cumsum()).\
       apply(lambda x : pd.concat([x.iloc[[0]].assign(Size=lambda y : y['Size'].str.split(',').str[::-1].str.join(',')),
                        x.iloc[1:,].iloc[::-1]]))

I have tried this code but it's printing variations before variables and that too in reverse order (on the large dataset). Please note 'Size' has different variations and not just limited to 'XL,M,S' and 'E42,E41' it also has values like 5XXL, 39mm etc. Any help would be appreciated

Any help would be appreciated.

Edit:

grp=(df.groupby('Type')).cumcount()



       Type SKU Description FullDescription Size    Price
0   variable    2.0 Boots   Shoes on sale   S,M,XL  NaN
2   variation   2.6 Boots S           NaN   S       330.0
4   variable    3.0 Helmet  Helmet Sizes    E41,E42 NaN
3   variation   2.7 Boots M           NaN   M       330.0
1   variation   2.5 Boots XL          NaN   XL      330.0
6   variation   3.2 Helmet E41        NaN   E41     123.0
5   variation   3.8 Helmet E42        NaN   E42     112.0

1 Answer 1

1
  1. You can temporarily replace the values to something that you can sort by, and at the end change them back by replacing sizes with dig. dig values are arbitrary, but sure everything is in order.
  2. Temporarily crate a grp column to sort
  3. lambda x function to join the strings with commas to a list, sort and then convert back to string
  4. Replace the dig values back to sizes

sizes, dig = ['S','M','XL','L',], ['000','111','333','222'] #make sure dig values do not exist as a substring anywhere in your dataframe
df = (df.assign(Size=df['Size'].replace(sizes, dig, regex=True))
        .assign(grp=(df['Type'] == 'Variable').cumsum()) 
        .sort_values(['grp', 'Type', 'Size']).drop('grp', axis=1))
df['Size'] = df['Size'].apply(lambda x: ','.join(sorted(x.split(',')))).replace(dig, sizes, regex=True)
df
Out[1]: 
        Type  SKU Description FullDescription     Size  Price
0   Variable  2.0       Boots   Shoes on sale   S,M,XL    NaN
2  Variation  2.6     Boots S             NaN        S  330.0
3  Variation  2.7     Boots M             NaN        M  330.0
1  Variation  2.5    Boots XL             NaN       XL  330.0
4   Variable  3.0      Helmet    Helmet Sizes  E41,E42    NaN
6  Variation  3.2  Helmet E41             NaN      E41   89.0
5  Variation  3.8  Helmet E42             NaN      E42   89.0
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, Thanks for the answer. I think it does work for the smaller set but for the original dataset, it is separating variables from variations. It first shows the list of variables and then that of variations.
not working, unfortunately, I have added the output in the original post above.
@hyeri thanks for that. Unfortunately, that is a significantly different problem to solve and significant iterative changes to the question are discouraged on StackOverflow. My solution works on the dataset you have provided. If you wouldn't mind kindly accepting my solution and creating a follow-up question with this separate problem, it would be greatly appreciated. I have to run soon anyway, and it should only take 2 minutes to copy paste to a new question and reference back to this one. Thank you!
It does work on the data I have provided, Thanks for your time and help :)
Hi David, your solution works for original dataset as well, however I have tried to add more sizes and couldn't get the required results, I was hoping if you could take a look as it is your code here : stackoverflow.com/questions/65616914/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.