Convert (flatten) multiple header Pandas dataframe

Question

I have the following Pandas dataframe taken from an Excel file (link to the Excel file)

I would like to flatten the Excel table with Pandas by converting the current headers (two first rows) to dataframe columns. This is where I want to get to:

segment unit    category    sub_category    value
seg1    kg      cat01       sub_cat_1.1     1
seg2    kg      cat01       sub_cat_1.1     2
seg1    kg      cat01       sub_cat_1.2     3
seg2    kg      cat01       sub_cat_1.2     
seg1    kg      cat02       sub_cat_2.1     4
seg2    kg      cat02       sub_cat_2.1     5

What I did so far is the folowing, but it doesn't work as expected:

import pandas as pd

_file_name = "stackoverflow_excel_data_example.xlsx"
df = pd.read_excel(_file_name,  header=[0,1]).sort_index()
df = df.stack()
print(df)

Does anyone know how to convert a custom -kind of pivot- table to a flat dataframe?

Corralien · Accepted Answer · 2021-06-11 16:02:05Z

1

No real magic here, you need to reorganize your MultiIndex before:

df.columns = pd.MultiIndex.from_tuples([('segment', ''), ('unit', '')] +
                                       df.columns[2:].to_list(),
                                       names=df.columns[1])

At this point, df looks like:

>>> df
category     segment unit       cat01                   cat02
sub_category              sub_cat_1.1 sub_cat_1.2 sub_cat_2.1 sub_cat_2.1.1 sub_cat_2.1.2 sub_cat_2.1.3 sub_cat_2.1.4
0               seg1   kg           1         3.0           4           NaN           NaN           NaN           NaN
1               seg2   kg           2         NaN           5           NaN           NaN           NaN           NaN

Now you can apply transformation:

>>> df.set_index(["segment", "unit"]) \
      .stack(level=[0, 1])\
      .rename("value") \
      .reset_index()

  segment unit category sub_category  value
0    seg1   kg    cat01  sub_cat_1.1    1.0
1    seg1   kg    cat01  sub_cat_1.2    3.0
2    seg1   kg    cat02  sub_cat_2.1    4.0
3    seg2   kg    cat01  sub_cat_1.1    2.0
4    seg2   kg    cat02  sub_cat_2.1    5.0

answered Jun 11, 2021 at 16:02

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

abdeltif-b Over a year ago

Exactly what I needed. Thank you so much!

Anton · Accepted Answer · 2021-06-12 06:05:44Z

1

df = pd.read_excel(..., header=[0, 1])
df = (
    df
    .iloc[:, 2:]
    .set_index(df.iloc[:, 0])
    .set_index(df.iloc[:, 1], append=True)
    .stack([0, 1])
    .rename_axis(["segement", "quantity", "category", "sub_category"])
    .rename("value")
    .reset_index()
)

The result for the provided example input is

edited Jun 12, 2021 at 6:05

answered Jun 11, 2021 at 15:58

Anton

6064 silver badges8 bronze badges

2 Comments

Corralien Over a year ago

Good answer really. +1. add .rename('value').reset_index() (and swap category and sub_category)

abdeltif-b Over a year ago

Thank you for your answer. As mentionned by @Corralien I had to add .rename('value').reset_index()

Collectives™ on Stack Overflow

Convert (flatten) multiple header Pandas dataframe

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related