Error in reading html to data frame in Python "'module' object has no attribute '_base'"

Question

I encounter this error when trying to read a table from url (link here).

Here is the code:

import pandas as pd
link = "http://www.checkee.info/main.php?dispdate="
c=pd.read_html(link)

The error returned is: AttributeError: 'module' object has no attribute '_base'

Specifically

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-5e6036f08795> in <module>()
      1 link = "http://www.checkee.info/main.php?dispdate="
----> 2 c=pd.read_html(link)

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
    859     pandas.read_csv
    860     """
--> 861     _importers()
    862 
    863     # Type check here. We don't want to parse only to fail because of an

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/pandas/io/html.pyc in _importers()
     40 
     41     try:
---> 42         import bs4  # noqa
     43         _HAS_BS4 = True
     44     except ImportError:

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/__init__.py in <module>()
     28 import warnings
     29 
---> 30 from .builder import builder_registry, ParserRejectedMarkup
     31 from .dammit import UnicodeDammit
     32 from .element import (

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/builder/__init__.py in <module>()
    312 register_treebuilders_from(_htmlparser)
    313 try:
--> 314     from . import _html5lib
    315     register_treebuilders_from(_html5lib)
    316 except ImportError:

/Users/lanyiyun/anaconda/lib/python2.7/site-packages/bs4/builder/_html5lib.py in <module>()
     68 
     69 
---> 70 class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
     71 
     72     def __init__(self, soup, namespaceHTMLElements):

AttributeError: 'module' object has no attribute '_base'

Anyone knows what the problem causes this? Thanks!

mindshoot · Accepted Answer · 2016-08-03 23:27:36Z

9

I've just had the same problem, and came across a solution on this page on github. For completeness, the comment/answer there was:

This is an issue with upstream package html5lib ... To fix, force downgrade to an older version:

pip install --upgrade html5lib==1.0b8

This solved the problem for me.

answered Aug 3, 2016 at 23:27

mindshoot

3912 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

abanana Over a year ago

I keep encountering this issue and spending ages on finding the solution. To save my future self some trouble, I'm going to leave this comment here so I remember that THIS is the solution that really worked.

mgokhanbakal Over a year ago

I used pip install --upgrade html5lib==1.0b but it gave "Could not find a version that satisfies the requirement". Then I tried pip install --upgrade html5lib==1.0b1 and it solved the issue.

user666 · Accepted Answer · 2016-07-29 05:55:01Z

0

Not sure why you're running into that problem, but I would try using BeautifulSoup to select the table you're interested in, and pass that to read_html() as a string. For example:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "http://www.checkee.info/main.php?dispdate="
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')

table = soup.find_all('table')[7] # Select the table you're interested in
df = pd.read_html(str(table))[0]

answered Jul 29, 2016 at 5:55

user666

5,7112 gold badges29 silver badges35 bronze badges

Collectives™ on Stack Overflow

Error in reading html to data frame in Python "'module' object has no attribute '_base'"

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related