0

Following the successful implementation of the index manipulation in my previous question (see link below) where I wanted the columns to be sorted alphanumerically.

I'd like to arrange the data frame with an additional/secondary index - customer category and sort the customer names within each category alphabetically.

I was thinking of creating a dictionary to map each customer name to a specific category and then sort by alphabetically. Not sure if that works or how to implement this.

  • i'm looking to sort alphabetically first for two_idx and then by name

This is the current code:

df = df.pivot_table(index=['name'], columns=['Duration'],
                                        aggfunc={'sum': np.sum}, fill_value=0)

# Sort Index Values - Duration
c = df_with_col_arg.columns.levels[1]
c = sorted(ns.natsorted(c), key=lambda x: not x.isdigit())

# Reindex Maturity values after Sorting
df_ = df.reindex_axis(pd.MultiIndex.from_product([df.columns.levels[0], c]), axis=1)

map_dict = {
            'Invoice A': 'A1. Retail',
            'Invoice B': 'A1. Retail',
            'Invoice Z': 'A1. Retail',
            'Invoice C': 'C1. Plastics',
            'Invoice F': 'C1. Plastics',
            'Invoice D': 'C2. Electronics',
            'Invoice J': 'C2. Electronics'
            }

# New Column - later to be converted to a secondary index
df['two_idx'] = df.index.to_series().map(map_dict)
df = df.sort_values(['two_idx'], ascending=[False]).set_index(['two_idx', 'name'])

Output of df.columns:

MultiIndex(levels=[[u'sum', u'two_idx'], [u'0', u'1', u'10', u'11', u'2', u'2Y', u'3', u'3Y', u'4', u'4Y', u'5', u'5Y', u'6', u'7', u'8', u'9', u'9Y', u'']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 1, 4, 6, 8, 10, 2, 3, 5, 7, 9, 11, 16, 17]])

The output I'm looking for is:

Duration                            2          2Y         3         3Y   
two_idx           name                                                                     
A1. Retail        Invoice A      25.50        0.00      0.00       20.00   
A1. Retail        Invoice B      50.00        25.00     -10.50     0.00
C1. Plastics      Invoice C      125.00       0.00      11.20      0.50
C2. Electronics   Invoice D       0.00        15.00      0.00       80.10

[Data Manipulation - Sort Index when values are Alphanumeric

0

1 Answer 1

1

I believe you need DataFrame.sort_index:

import natsort as ns

#add parameter values for remove MultiIndex in columns
df = df.pivot_table(index='name', 
                    columns='Duration',
                    values='sum',
                    aggfunc='sum', 
                    fill_value=0)

#https://stackoverflow.com/a/47240142/2901002
c = sorted(ns.natsorted(df.columns), key=lambda x: not x.isdigit())
df = df.reindex(c, axis=1)

map_dict = {
            'Invoice A': 'A1. Retail',
            'Invoice B': 'A1. Retail',
            'Invoice Z': 'A1. Retail',
            'Invoice C': 'C1. Plastics',
            'Invoice F': 'C1. Plastics',
            'Invoice D': 'C2. Electronics',
            'Invoice J': 'C2. Electronics'
            }

#create new level of MultiIndex and assign back
df.index = pd.MultiIndex.from_arrays([df.rename(map_dict).index, 
                                      df.index], names=['name','one'])

#sorting index
df = df.sort_index()
print (df)
                               2     3    2Y    3Y
name            one                               
A1. Retail      Invoice A   25.5   0.0   0.0  20.0
                Invoice B   50.0 -10.5  25.0   0.0
C1. Plastics    Invoice C  125.0  11.2   0.0   0.5
C2. Electronics Invoice D    0.0   0.0  15.0  80.1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.