1

Here is the initial table as shown below.

Cust ID Jan Transaction Fee Jan Transaction Fee Jan Product Fee Jan Product Fee Feb Transaction Fee Feb Transaction Fee Feb Product Fee Feb Product Fee
HKD USD HKD USD HKD USD HKD USD
100103 100 20 21 24 215 55 253 25
100104 200 30 31 34 315 65 353 35

I would like to convert the table from above to the below expected result.

Cust ID Period Type FX Price
100103 202201 Transaction Fee HKD 100
100103 202201 Transaction Fee USD 20
100103 202201 Product Fee HKD 21
100103 202201 Product Fee USD 24
100103 202202 Transaction Fee HKD 215
100103 202202 Transaction Fee USD 55
100103 202202 Product Fee HKD 253
100103 202202 Product Fee USD 25
100104 202201 Transaction Fee HKD 200
100104 202201 Transaction Fee USD 30
100104 202201 Product Fee HKD 31
100104 202201 Product Fee USD 34
100104 202202 Transaction Fee HKD 315
100104 202202 Transaction Fee USD 65
100104 202202 Product Fee HKD 353
100104 202202 Product Fee USD 35

My coding on import data is below

import pandas as pd 


test=pd.DataFrame({'Cust ID':['','100103','100104'],'Jan Transaction Fee':['HKD',100,200],'Jan Transaction Fee.1':['USD',20,30],\
                  'Jan Product Fee':['HKD',21,31],'Jan Product Fee.1':['USD',24,34],
                  'Feb Transaction Fee':['HKD',215,315],'Feb Transaction Fee.1':['USD',55,65],
                  'Feb Product Fee':['HKD',253,353],'Feb Product Fee.1':['USD',25,35]})

test

is there a way to do the expected result with using Python?

5
  • do you really have the '.1' in the input data column names? Commented Apr 6, 2022 at 12:34
  • You need pivot. Search for examples on this page. Commented Apr 6, 2022 at 12:35
  • @mozway If there is a way to do it without '.1', then ignore the'.1' column name. Thanks so much for your help Commented Apr 6, 2022 at 12:49
  • Looks like a multiindex. Pulled from an excel file? How was the data read? Commented Apr 6, 2022 at 13:01
  • @sammywemmy Yes, the dataset is pulled from an excel file, but it is confidential. So, i just manipulated the data first and showed a few customer ID as an example Commented Apr 6, 2022 at 15:09

1 Answer 1

2

This is a complex reshape.

NB. I ignored the '.1', removed using test.columns = test.columns.map(lambda s: s.strip('.1')).

df = (test
 .T.set_index(0, append=True).T
 .set_index([('Cust ID', '')])
 .stack()
 .rename_axis(index=['Cust ID', 'FX'], columns='Type')
 .stack()
 .reset_index(name='Price')
)

df[['Period', 'Type']] = df['Type'].str.split(n=1, expand=True)
df['Period'] = pd.to_datetime('2022 '+df['Period']).dt.strftime('%Y%m')

output:

  Cust ID   FX             Type Price  Period
0  100103  HKD  Transaction Fee   100  202201
1  100103  USD      Product Fee    25  202202
2  100103  USD  Transaction Fee    55  202202
3  100103  USD      Product Fee    24  202201
4  100103  USD  Transaction Fee    20  202201
5  100104  HKD  Transaction Fee   200  202201
6  100104  USD      Product Fee    35  202202
7  100104  USD  Transaction Fee    65  202202
8  100104  USD      Product Fee    34  202201
9  100104  USD  Transaction Fee    30  202201
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.