3

This piece of code is giving me an error:

Code :

import pandas as pd

fiddy_states = pd.read_html("https://simple.wikipedia.org/wiki/List_of_U.S._states")

Error:

> ---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-9-87a39d7446f6> in <module>()
      1 import pandas as pd
----> 2 df_states = pd.read_html('http://www.50states.com/abbreviations.htm#.Vmz0ZkorLIU')

C:\Anaconda3\lib\site-packages\pandas\io\html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
    864     _validate_header_arg(header)
    865     return _parse(flavor, io, match, header, index_col, skiprows,
--> 866                   parse_dates, tupleize_cols, thousands, attrs, encoding)

C:\Anaconda3\lib\site-packages\pandas\io\html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding)
    716     retained = None
    717     for flav in flavor:
--> 718         parser = _parser_dispatch(flav)
    719         p = parser(io, compiled_match, attrs, encoding)
    720 

C:\Anaconda3\lib\site-packages\pandas\io\html.py in _parser_dispatch(flavor)
    661     if flavor in ('bs4', 'html5lib'):
    662         if not _HAS_HTML5LIB:
--> 663             raise ImportError("html5lib not found, please install it")
    664         if not _HAS_BS4:
    665             raise ImportError("BeautifulSoup4 (bs4) not found, please install it")

ImportError: html5lib not found, please install it

Although I have html5lib, lxml and BeatifulSoup4 library installed and updated.

6
  • Please format your code properly. Commented Dec 3, 2015 at 6:38
  • Please provide full stacktrace Commented Dec 3, 2015 at 7:14
  • try to import html5lib in python console. Is it working fine? Commented Dec 3, 2015 at 9:57
  • @AntonProtopopov I tried but it says that also is giving me an error although I have installed it. Any ideas? Commented Dec 13, 2015 at 4:46
  • @DeepSpace I just did. Could you please help me out? Commented Dec 13, 2015 at 4:49

2 Answers 2

1

Consider parsing the html table with lxml using xpath expressions and then incorporating lists into a data frame:

import urllib.request as rq
import lxml.etree as et
import pandas as pd

# DOWNLOAD WEB PAGE CONTENT
rqpage = rq.urlopen('https://simple.wikipedia.org/wiki/List_of_U.S._states')
txtpage = rqpage.read()
dom = et.HTML(txtpage)

# XPATH EXPRESSIONS TO LISTS (SKIPPING HEADER COLUMN)
abbreviation= dom.xpath("//table[@class='wikitable']/tr[position()>1]/td[1]/b/text()")
state = dom.xpath("//table[@class='wikitable']/tr[position()>1]//td[2]/a/text()")
capital = dom.xpath("//table[@class='wikitable']/tr[position()>1]//td[3]/a/text()")
incorporated = dom.xpath("//table[@class='wikitable']/tr[position()>1]//td[4]/text()")

# CONVERT LISTS TO DATA FRAME
df = pd.DataFrame({'Abbreviation':abbreviation,
                   'State':state,
                   'Capital':capital,
                   'Incorporated':incorporated})

print(df.head())

#   Abbreviation      Capital       Incorporated       State
#0            AL   Montgomery  December 14, 1819     Alabama
#1            AK       Juneau    January 3, 1959      Alaska
#2            AZ      Phoenix  February 14, 1912     Arizona
#3            AR  Little Rock      June 15, 1836    Arkansas
#4            CA   Sacramento  September 9, 1850  California
Sign up to request clarification or add additional context in comments.

Comments

0

Try and use conda to install html5lib instead of pip. Worked for me.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.