7

I create a dataframe and export to an html table. However the headers are off as below

How can I combine the index name row, and the column name row?

I want the table header to look like this:

<table><th>Name</th></table>

but it currently exports to html like this:

enter image description here

I create the dataframe as below (example):

data = [{'Name': 'A', 'status': 'ok', 'host': '1', 'time1': '2020-01-06 06:31:06', 'time2': '2020-02-06 21:10:00'}, {'Name': 'A', 'status': 'ok', 'host': '2', 'time1': '2020-01-06 06:31:06', 'time2': '-'}, {'Name': 'B', 'status': 'Alert', 'host': '1', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'}, {'Name': 'B', 'status': 'ok', 'host': '2', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'B', 'status': 'ok', 'host': '4', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'Alert', 'host': '2', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'ok', 'host': '3', 'time1': '2020-01-06 10:31:06', 'time2': '2020-02-06 21:10:00'},{'Name': 'C', 'status': 'ok', 'host': '4', 'time1': '-', 'time2': '-'}]

df = pandas.DataFrame(data)
df.set_index(['Name', 'status', 'host'], inplace=True)
html_body = df.to_html(bold_rows=False)

The index is set to have hierarchical rows, for easier reading in an html table:

print(df)

                               time1                time2
Name status host                                          
A    ok     1     2020-01-06 06:31:06  2020-02-06 21:10:00
            2     2020-01-06 06:31:06                    -
B    Alert  1     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     2     2020-01-06 10:31:06  2020-02-06 21:10:00
            4     2020-01-06 10:31:06  2020-02-06 21:10:00
C    Alert  2     2020-01-06 10:31:06  2020-02-06 21:10:00
     ok     3     2020-01-06 10:31:06  2020-02-06 21:10:00
            4                       -                    -

The only solution that I've got working is to set every column to index. This doesn't seem practical tho, and leaves an empty row that must be manually removed:

enter image description here

4
  • 3
    df.reset_index().to_html(index=False, bold_rows=False) Commented Jan 6, 2020 at 18:04
  • @piRSquared - reset_index() will remove the index. i.e. no more merging duplicate values in the first 2 rows which I want. The outputted html table needs to be readable Commented Jan 6, 2020 at 18:10
  • That is true. But that is happening because Pandas does that to the MultiIndex. The staggering of the columns that you want to avoid is happening because the "Columns" you want "Merged" are really levels of the MultiIndex. So you probably need to parse the object yourself and make your own html table. Or you can hack the resulting html that you're already getting. Commented Jan 6, 2020 at 18:13
  • df.reset_index() does not alter df itself. It returns a new dataframe, in other word, a use-once variable for the .to_html call Commented Jan 6, 2020 at 18:17

2 Answers 2

4

Setup

import pandas as pd
from IPython.display import HTML

l0 = ('Foo', 'Bar')
l1 = ('One', 'Two')
ix = pd.MultiIndex.from_product([l0, l1], names=('L0', 'L1'))
df = pd.DataFrame(1, ix, [*'WXYZ'])

HTML(df.to_html())

enter image description here


BeautifulSoup

Hack the HTML result from df.to_html(header=False). Pluck out the empty cells in the table head and drop in the column names.

from bs4 import BeautifulSoup

html_doc = df.to_html(header=False)
soup = BeautifulSoup(html_doc, 'html.parser')

empty_cols = soup.find('thead').find_all(lambda tag: not tag.contents)

for tag, col in zip(empty_cols, df):
    tag.string = col

HTML(soup.decode_contents())

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

So the header "fixing" isn't something that Pandas can do, or even cares about? Which would be fair enough, I'll give the above a test, using BeautifulSoup in the script anyhow :)
This more or less perfect for me, will fix a couple tables for me thank you! Final question is on what zip(empty_cols, df) does in the for loop? I can see empty_cols references any empty column headers that were passed to the soup object, how does the zip() method pick out the correct column names for\ these?
How can I have a header name for the row index column (the extreme left column) in dataframe html table?
That’s a different question altogether. Look at the method rename_axis
I don't know how or why but I need to use tag.string = str(col) instead of col. Just thought this might help anyone.
3

If you want to use a Dataframe Styler to perform a lot of wonderful formatting on your table, the elements, and the contents, then you might need a slight change to piRSquared's answer, as I did.

before transformation

style.to_html() added non-breaking spaces which made tag.contents always return true, and thus yielded no change to the table. I modified the lambda to account for this, which revealed another issue.

lambda tag: (not tag.contents) or '\xa0' in tag.contents

Cells were copied strangely

Styler.to_html() lacks the header kwarg - I am guessing this is the source of the issue. I took a slightly different approach - Move the second row headers into the first row, and then destroy the second header row.

It seems pretty generic and reusable for any multi-indexed dataframe.

df_styler = summary_df.style
# Use the df_styler to change display format, color, alignment, etc.
raw_html = df_styler.to_html()
soup = BeautifulSoup(raw_html,'html.parser')
head = soup.find('thead')
trs = head.find_all('tr')
ths0 = trs[0].find_all(lambda tag: (not tag.contents) or '\xa0' in tag.contents)
ths1 = trs[1].find_all(lambda tag: (tag.contents) or '\xa0' not in tag.contents)
for blank, filled in zip(ths0, ths1):
    blank.replace_with(filled)
trs[1].decompose()
final_html_str = soup.decode_contents()

Success - two header rows condensed into one

Big Thanks to piRSquared for the starting point of Beautiful soup!

1 Comment

I have been using piRSquared's solution for several years and a couple of months ago the headings from the first row stopped appearing in my final table. Adding in the or '\xa0' check in the lambda has fixed it for me with no other changes. Not sure what changed in pandas to cause that, but thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.