24

I've come accross the following error about html5lib when trying to read an html data frame.

Here is the code:

!pip install html5lib
!pip install lxml
!pip install beautifulSoup4

import html5lib
import lxml
from bs4 import BeautifulSoup

table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

This is the error:

ImportError                               Traceback (most recent call last)
<ipython-input-68-e24654a0a301> in <module>()
----> 1 table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
    913                   thousands=thousands, attrs=attrs, encoding=encoding,
    914                   decimal=decimal, converters=converters, na_values=na_values,
--> 915                   keep_default_na=keep_default_na)

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
    737     retained = None
    738     for flav in flavor:
--> 739         parser = _parser_dispatch(flav)
    740         p = parser(io, compiled_match, attrs, encoding)
    741 

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parser_dispatch(flavor)
    680     if flavor in ('bs4', 'html5lib'):
    681         if not _HAS_HTML5LIB:
--> 682             raise ImportError("html5lib not found, please install it")
    683         if not _HAS_BS4:
    684             raise ImportError(

ImportError: html5lib not found, please install it

Any help would be much appreciated. Thanks

5 Answers 5

25

If you read the error message, you don't have html5lib installed. Do:

pip install html5lib

in your terminal.


If you are calling from jupyter notebook (just like you did with !), try to restart the kernel in order to have the packages loaded.

Sign up to request clarification or add additional context in comments.

2 Comments

Are you running your code in jupyter notebook? If yes, have you tried to restart the kernel?
Yes, I'm using Jupyter. Just restarted the kernel and run fine now. Thanks Yilun ;)
1

I had this exact error show up while trying to read a saved .htm file using Spyder IDE.

This code displayed html5lib error:

import pandas as pd
df = pd.read_html("F:\xxxx\xxxxx\xxxxx\aaaa.htm")

I knew I had html5lib installed and working correctly because I had other scripts that worked.

For whatever reason, file path needed to be a string literal (putting an r in front of the file path).

This code works for me:

import pandas as pd
df = pd.read_html(r"F:\xxxx\xxxxx\xxxxx\aaaa.htm")

Comments

0

I ran into this error when I gave the wrong path to the local file I was trying to open. So also be sure that you're pointing to the right place!

Comments

0

For my MacBook I used the following to install:

python3 -m pip install html5lib

I also updated my libs using:

python3.11 -m pip install --upgrade pip

Once done, the problem was solved

Comments

0

I ran into this off and on for a couple months and wasn't able to keep it failing long enough to troubleshoot. I know the library is loaded because this runs just fine most of the time. I even installed it again with no effect. Today I figured it out.

I was passing a list of HTML files to a function that read the tables into dataframes. The list was one larger than it should have been; the first filename was duplicated with a '~' first character and added to the list. Rather than try find out why the extra file was in the list, I added a filter to the list parser to check for '~' in the string and, if true, skip it. I haven't tested it much yet, but it quit erroring out with the code change.

If anyone knows what caused the extra filename to be created, I'd like to know.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.