I am relatively new to Python (Pandas) which I would like to use for automating Excel tasks and be more efficient at my work :)
Currently I am sitting in front of below Excel sales report where the "year" is a merged cell.
| 2018 | 2019 |
| Product | January | February | March | April | January | February | March | April |
| A | 8 | 10 | 65 | 50 | 8 | 10 | 65 | 50 |
| B | 9 | 10 | 65 | 50 | 8 | 63 | 65 | 50 |
| C | 7 | 10 | 65 | 50 | 8 | 10 | 65 | 50 |
| D | 8 | 10 | 65 | 50 | 8 | 10 | 65 | 50 |
Now I would like to reshape the report into a stacked format, which I can then write back to Excel, and use for further analysis:
Product | Year | Month | Values
A | 2018 | January | 8
B | 2018 | February| 9
My idea was to create a dataframe and use pd.melt()
Unfortunately I fail already at the very first step when trying to create the dataframe.
The "Year" is only written in 2 cells while the rest shows "unnamed x".
import pandas as pd
// change console output
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
//Read Excel file and create dataframe
df = pd.read_excel("Stackoverflow_example.xlsx")
print(df)
Unnamed: 0 2018 Unnamed: 2 Unnamed: 3 Unnamed: 4 2019 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 Product January February March April January February March April
1 A 8 10 65 50 8 10 65 50
2 B 9 10 65 50 8 63 65 50
3 C 7 10 65 50 8 10 65 50
4 D 8 10 65 50 8 10 65 50
It would be great if someone could help me on this problem.
Many thanks in advance.
Edit:
Adding header=[0,1], index_col=[0] worked, but I am still struggling to find a way to convert it into a stacked format.....
import pandas as pd
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
df = pd.read_excel("Stackoverflow_example.xlsx", header=[0,1], index_col=[0])
print(df)
----------------------------------------------------------------------
2018 2019
Product January February March April January February March April
A 8 10 65 50 8 10 65 50
B 9 10 65 50 8 63 65 50
C 7 10 65 50 8 10 65 50
D 8 10 65 50 8 10 65 50
It worked, but messed up the column header names at the same time (level_0, "Product" is in the "month" column...
import pandas as pd
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
df = pd.read_excel("Stackoverflow_example.xlsx", header=[0,1], index_col=[0])
df = df.stack().reset_index()
print(df)
-----------------------------------------------------------------------------
level_0 Product 2018 2019
0 A April 50 50
1 A February 10 10
2 A January 8 8
3 A March 65 65
4 B April 50 50
5 B February 10 63
6 B January 9 8
7 B March 65 65
8 C April 50 50
9 C February 10 10
10 C January 7 8
11 C March 65 65
12 D April 50 50
13 D February 10 10
14 D January 8 8
15 D March 65 65
I tried to rename the columns and set the index to "Product", resulting in empty "cells" below "Month 2018 2019"
import pandas as pd
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
df = pd.read_excel("Stackoverflow_example.xlsx", header=[0,1], index_col=[0])
df = df.stack().reset_index()
df.columns = ["Product", "Month", "2018", "2019"]
df = df.set_index("Product")
print(df)
----------------------------------------------------------
Month 2018 2019
Product
A April 50 50
A February 10 10
A January 8 8
A March 65 65
B April 50 50
B February 10 63
B January 9 8
B March 65 65
C April 50 50
C February 10 10
C January 7 8
C March 65 65
D April 50 50
D February 10 10
D January 8 8
D March 65 65
unstack, answer edited.