0

I may just not understand pandas fully but I am getting some unexpected behavior when using read_html() with the index_col flag set, modifying the data frame, and then attempting to use to_html() again.

Here is what I mean. I have this HTML file:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>build1</td>
      <td>55.102323</td>
      <td>37.101219</td>
      <td>60.7</td>
    </tr>
  </tbody>
</table>

I then use pandas read_html as follows:

dataFrameList = pd.read_html('empty.html', index_col=0)
df = dataFrameList[0]

This produces a data frame as follows:

              Avg        Min   Max
index                             
build1  55.102323  37.101219  60.7

I then have a small bit of test code that looks like this:

df.drop(['build1'], inplace=True)
df.loc['build2'] = [121212, 12443, 1290120]
print(df.to_html())

I get the following output:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
    <tr>
      <th>index</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>build2</th>
      <td>121212.0</td>
      <td>12443.0</td>
      <td>1290120.0</td>
    </tr>
  </tbody>
</table>

What did I do wrong? I have tried to set the flag to_html(.., index=False) off but this gets rid of the build names (which I need).

My desired output (just so that it is clear) is as follows:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>Avg</th>
      <th>Min</th>
      <th>Max</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>build2</th>
      <td>121212.0</td>
      <td>12443.0</td>
      <td>1290120.0</td>
    </tr>
  </tbody>
</table>

1 Answer 1

1

There is a workaround:

df.insert(0, 'index', df.index)
print(df.to_html(index=False))

This produces the desired output (except for that <th> in the second row, which, I guess, is a typo?).

Sign up to request clarification or add additional context in comments.

5 Comments

This works wonderfully but if you can add an explanation I would be extremely thankful.
to_html function is used to render tables. Try outputting the table in jupyter notebook, and you will see that there really are two headers, one of which is for columns, and the second is for indices (you can have MultiIndex). What I am doing above is just creating a fake column called index at position 0 and copying data from index into it. Then we just output it into html without the actual index.
yeah, if you look at the table it outputs, and your text output above, they both look similar (columns on row 0, 'index' as a header to the index on row 1, actual index rows start on row 2).
@noisefield: neat, please edit the explanation of the issue into your answer (then it gets indexed and is searchable, unlike comments, and isn't prone to disappearing ephemerally).
(Was this a version-specific bug back in 2017? (which versions?) is it fixed now? is this working in the current version? etc.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.