Converting html to excel in Python

Question

I'm trying to convert the table in the following site to an xls table:

http://www.dekel.co.il/madad-lazarchan

The following is the code I came up with from researching:

from bs4 import BeautifulSoup
import pandas as pd
from urllib2 import urlopen
import requests
import csv

url='http://www.dekel.co.il/madad-lazarchan'
table = pd.read_html(requests.get(url).text, attrs={"class" : "medadimborder"})

print table</code>

How can I get it to display the headers properly and output to a csv or xls file?

If I add the following:

table.to_csv('test.csv')

instead of the print row I get this error:

'list' object has no attribute 'to_csv'

Thanks in Advance!

Okay based on the comments maybe I shouldn't use panda or read_html as I want a table and not a list. I wrote the following code but now the printout has delimiters and looks like I lost the header row. Also still not sure how to export it to csv file.

from bs4 import BeautifulSoup import urllib2 import csv soup = BeautifulSoup(urllib2.urlopen('http://www.dekel.co.il/madad-lazarchan').read(), 'html') data = [] table = soup.find("table", attrs={"class" : "medadimborder"}) table_body = table.find('tbody') rows = table_body.findAll('tr') for row in rows: cols = row.findAll('td') cols = [ele.text.strip() for ele in cols] print cols

[u'01/16', u'130.7915', u'122.4640', u'117.9807', u'112.2557', u'105.8017', u'100.5720', u'98.6'] [u'12/15', u'131.4547', u'123.0850', u'118.5790', u'112.8249', u'106.3383', u'101.0820', u'99.1'] [u'11/15', u'131.5874', u'123.2092', u'118.6986', u'112.9387', u'106.4456', u'101.1840', u'99.2']

lowmatic · Accepted Answer · 2016-03-06 19:21:33Z

1

You can use python packages available to work with Excel files. here is a list.

answered Mar 6, 2016 at 19:21

lowmatic

5911 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Brad Martsberger Over a year ago

That link has openpyxl at the top of the list, if you want to easily get your html tables into openpyxl, you can use tablepyxl github.com/martsberger/tablepyxl

JackCid · Accepted Answer · 2020-10-26 16:19:04Z

1

pandas.read_html returns a list of DataFrames instead of a single one. You have to specify the index of the DataFrame in the returned list (index = 0 in this case):

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv

#now the result of read_html will be named 'tables', which is a list of DataFrames
tables = pd.read_html(requests.get(url).text, attrs={"class" : "medadimborder"})
#assigning the first element of the list of DataFrames 'tables' into DataFrame 'table'
table = tables[0]
#converting into csv
table.to_csv('test.csv')

Regards

edited Oct 26, 2020 at 16:19

answered Jan 12, 2018 at 16:20

JackCid

2661 silver badge8 bronze badges

Comments

jDo · Accepted Answer · 2016-03-06 21:07:15Z

0

Your "table" variable is not a pandas dataframe but instead a 2D list whose first and only element is the pandas dataframe. Logically, calling pandas methods on a python list won't work and raises an AttributeError. Python's built-in type() and dir() reveal this:

>>> type(table)
<class 'list'>

>>> type(table[0])
<class 'pandas.core.frame.DataFrame'>

# no error 
>>> table[0].to_csv('test.csv')
>>> 

# 2D to 1D list 
>>> table = table[0]
>>> table.to_csv('test.csv')
>>>

answered Mar 6, 2016 at 21:07

jDo

4,0301 gold badge13 silver badges30 bronze badges

1 Comment

phatrabbitzz Over a year ago

Updated initial question to show new code without panda, as I want a table and not a list.

Collectives™ on Stack Overflow

Converting html to excel in Python

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related