3

I encounter this error when trying to read a table from url (link here).

Here is the code:

import pandas as pd
link = "http://www.checkee.info/main.php?dispdate="
c=pd.read_html(link)

The error returned is: AttributeError: 'module' object has no attribute '_base'

Specifically

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-5e6036f08795> in <module>()
      1 link = "http://www.checkee.info/main.php?dispdate="
----> 2 c=pd.read_html(link)

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
    859     pandas.read_csv
    860     """
--> 861     _importers()
    862 
    863     # Type check here. We don't want to parse only to fail because of an

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/pandas/io/html.pyc in _importers()
     40 
     41     try:
---> 42         import bs4  # noqa
     43         _HAS_BS4 = True
     44     except ImportError:

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/__init__.py in <module>()
     28 import warnings
     29 
---> 30 from .builder import builder_registry, ParserRejectedMarkup
     31 from .dammit import UnicodeDammit
     32 from .element import (

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/builder/__init__.py in <module>()
    312 register_treebuilders_from(_htmlparser)
    313 try:
--> 314     from . import _html5lib
    315     register_treebuilders_from(_html5lib)
    316 except ImportError:

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/builder/_html5lib.py in <module>()
     68 
     69 
---> 70 class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
     71 
     72     def __init__(self, soup, namespaceHTMLElements):

AttributeError: 'module' object has no attribute '_base'

Anyone knows what the problem causes this? Thanks!

2 Answers 2

9

I've just had the same problem, and came across a solution on this page on github. For completeness, the comment/answer there was:

This is an issue with upstream package html5lib ... To fix, force downgrade to an older version:

pip install --upgrade html5lib==1.0b8

This solved the problem for me.

Sign up to request clarification or add additional context in comments.

2 Comments

I keep encountering this issue and spending ages on finding the solution. To save my future self some trouble, I'm going to leave this comment here so I remember that THIS is the solution that really worked.
I used pip install --upgrade html5lib==1.0b but it gave "Could not find a version that satisfies the requirement". Then I tried pip install --upgrade html5lib==1.0b1 and it solved the issue.
0

Not sure why you're running into that problem, but I would try using BeautifulSoup to select the table you're interested in, and pass that to read_html() as a string. For example:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "http://www.checkee.info/main.php?dispdate="
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')

table = soup.find_all('table')[7] # Select the table you're interested in
df = pd.read_html(str(table))[0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.