Pandas - read_html with index_col not intended output when doing to_html

Question

I may just not understand pandas fully but I am getting some unexpected behavior when using read_html() with the index_col flag set, modifying the data frame, and then attempting to use to_html() again.

Here is what I mean. I have this HTML file:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>build1</td>
      <td>55.102323</td>
      <td>37.101219</td>
      <td>60.7</td>
    </tr>
  </tbody>
</table>

I then use pandas read_html as follows:

dataFrameList = pd.read_html('empty.html', index_col=0)
df = dataFrameList[0]

This produces a data frame as follows:

              Avg        Min   Max
index                             
build1  55.102323  37.101219  60.7

I then have a small bit of test code that looks like this:

df.drop(['build1'], inplace=True)
df.loc['build2'] = [121212, 12443, 1290120]
print(df.to_html())

I get the following output:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
    <tr>
      <th>index</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>build2</th>
      <td>121212.0</td>
      <td>12443.0</td>
      <td>1290120.0</td>
    </tr>
  </tbody>
</table>

What did I do wrong? I have tried to set the flag to_html(.., index=False) off but this gets rid of the build names (which I need).

My desired output (just so that it is clear) is as follows:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>build2</th>
      <td>121212.0</td>
      <td>12443.0</td>
      <td>1290120.0</td>
    </tr>
  </tbody>
</table>

noisefield · Accepted Answer · 2017-07-17 14:53:32Z

1

There is a workaround:

df.insert(0, 'index', df.index)
print(df.to_html(index=False))

This produces the desired output (except for that <th> in the second row, which, I guess, is a typo?).

answered Jul 17, 2017 at 14:53

noisefield

3612 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Abdall Over a year ago

This works wonderfully but if you can add an explanation I would be extremely thankful.

noisefield Over a year ago

to_html function is used to render tables. Try outputting the table in jupyter notebook, and you will see that there really are two headers, one of which is for columns, and the second is for indices (you can have MultiIndex). What I am doing above is just creating a fake column called index at position 0 and copying data from index into it. Then we just output it into html without the actual index.

Corley Brigman Over a year ago

yeah, if you look at the table it outputs, and your text output above, they both look similar (columns on row 0, 'index' as a header to the index on row 1, actual index rows start on row 2).

smci Over a year ago

@noisefield: neat, please edit the explanation of the issue into your answer (then it gets indexed and is searchable, unlike comments, and isn't prone to disappearing ephemerally).

smci Over a year ago

(Was this a version-specific bug back in 2017? (which versions?) is it fixed now? is this working in the current version? etc.)

Collectives™ on Stack Overflow

Pandas - read_html with index_col not intended output when doing to_html

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related