Combine index header row and column header row in Pandas

Question

I create a dataframe and export to an html table. However the headers are off as below

How can I combine the index name row, and the column name row?

I want the table header to look like this:

but it currently exports to html like this:

I create the dataframe as below (example):

data = [{'Name': 'A', 'status': 'ok', 'host': '1', 'time1': '2020-01-06 06:31:06', 'time2': '2020-02-06 21:10:00'}, {'Name': 'A', 'status': 'ok', 'host': '2', 'time1': '2020-01-06 06:31:06', 'time2': '-'}, {'Name': 'B', 'status': 'Alert', 'host': '1', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'}, {'Name': 'B', 'status': 'ok', 'host': '2', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'B', 'status': 'ok', 'host': '4', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'Alert', 'host': '2', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'ok', 'host': '3', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'ok', 'host': '4', 'time1': '-', 'time2': '-'}]

df = pandas.DataFrame(data)
df.set_index(['Name', 'status', 'host'], inplace=True)
html_body = df.to_html(bold_rows=False)

The index is set to have hierarchical rows, for easier reading in an html table:

print(df)

                               time1                time2
Name status host                                          
A    ok     1     2020-01-06 06:31:06  2020-02-06 21:10:00
            2     2020-01-06 06:31:06                    -
B    Alert  1     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     2     2020-01-06 10:31:06  2020-02-06 21:10:00
            4     2020-01-06 10:31:06  2020-02-06 21:10:00
C    Alert  2     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     3     2020-01-06 10:31:06  2020-02-06 21:10:00
            4                       -                    -

The only solution that I've got working is to set every column to index. This doesn't seem practical tho, and leaves an empty row that must be manually removed:

@piRSquared - reset_index() will remove the index. i.e. no more merging duplicate values in the first 2 rows which I want. The outputted html table needs to be readable — Killzerman
– Killzerman, Commented Jan 6, 2020 at 18:10
That is true. But that is happening because Pandas does that to the MultiIndex. The staggering of the columns that you want to avoid is happening because the "Columns" you want "Merged" are really levels of the MultiIndex. So you probably need to parse the object yourself and make your own html table. Or you can hack the resulting html that you're already getting. — piRSquared
– piRSquared, Commented Jan 6, 2020 at 18:13
df.reset_index() does not alter df itself. It returns a new dataframe, in other word, a use-once variable for the .to_html call — Code Different
– Code Different, Commented Jan 6, 2020 at 18:17

piRSquared · Accepted Answer · 2020-01-06 19:05:33Z

4

Setup

import pandas as pd
from IPython.display import HTML

l0 = ('Foo', 'Bar')
l1 = ('One', 'Two')
ix = pd.MultiIndex.from_product([l0, l1], names=('L0', 'L1'))
df = pd.DataFrame(1, ix, [*'WXYZ'])

HTML(df.to_html())

BeautifulSoup

Hack the HTML result from df.to_html(header=False). Pluck out the empty cells in the table head and drop in the column names.

from bs4 import BeautifulSoup

html_doc = df.to_html(header=False)
soup = BeautifulSoup(html_doc, 'html.parser')

empty_cols = soup.find('thead').find_all(lambda tag: not tag.contents)

for tag, col in zip(empty_cols, df):
    tag.string = col

HTML(soup.decode_contents())

edited Jan 6, 2020 at 19:05

answered Jan 6, 2020 at 19:00

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Killzerman Over a year ago

So the header "fixing" isn't something that Pandas can do, or even cares about? Which would be fair enough, I'll give the above a test, using BeautifulSoup in the script anyhow :)

Killzerman Over a year ago

This more or less perfect for me, will fix a couple tables for me thank you! Final question is on what zip(empty_cols, df) does in the for loop? I can see empty_cols references any empty column headers that were passed to the soup object, how does the zip() method pick out the correct column names for\ these?

Stop War Over a year ago

How can I have a header name for the row index column (the extreme left column) in dataframe html table?

piRSquared Over a year ago

That’s a different question altogether. Look at the method rename_axis

Nuclear241 Over a year ago

I don't know how or why but I need to use tag.string = str(col) instead of col. Just thought this might help anyone.

Dharman · Accepted Answer · 2021-08-25 17:45:09Z

3

If you want to use a Dataframe Styler to perform a lot of wonderful formatting on your table, the elements, and the contents, then you might need a slight change to piRSquared's answer, as I did.

before transformation

style.to_html() added non-breaking spaces which made tag.contents always return true, and thus yielded no change to the table. I modified the lambda to account for this, which revealed another issue.

lambda tag: (not tag.contents) or '\xa0' in tag.contents

Cells were copied strangely

Styler.to_html() lacks the header kwarg - I am guessing this is the source of the issue. I took a slightly different approach - Move the second row headers into the first row, and then destroy the second header row.

It seems pretty generic and reusable for any multi-indexed dataframe.

df_styler = summary_df.style
# Use the df_styler to change display format, color, alignment, etc.
raw_html = df_styler.to_html()
soup = BeautifulSoup(raw_html,'html.parser')
head = soup.find('thead')
trs = head.find_all('tr')
ths0 = trs[0].find_all(lambda tag: (not tag.contents) or '\xa0' in tag.contents)
ths1 = trs[1].find_all(lambda tag: (tag.contents) or '\xa0' not in tag.contents)
for blank, filled in zip(ths0, ths1):
    blank.replace_with(filled)
trs[1].decompose()
final_html_str = soup.decode_contents()

Success - two header rows condensed into one

Big Thanks to piRSquared for the starting point of Beautiful soup!

edited Aug 25, 2021 at 17:45

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered Aug 25, 2021 at 17:39

esPIEnage

311 bronze badge

1 Comment

Shannon Over a year ago

I have been using piRSquared's solution for several years and a couple of months ago the headings from the first row stopped appearing in my final table. Adding in the or '\xa0' check in the lambda has fixed it for me with no other changes. Not sure what changed in pandas to cause that, but thanks!

Collectives™ on Stack Overflow

Combine index header row and column header row in Pandas

2 Answers 2

Setup

BeautifulSoup

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Setup

BeautifulSoup

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related