3

I'm trying to convert the table in the following site to an xls table:

http://www.dekel.co.il/madad-lazarchan

The following is the code I came up with from researching:

from bs4 import BeautifulSoup
import pandas as pd
from urllib2 import urlopen
import requests
import csv

url='http://www.dekel.co.il/madad-lazarchan'
table = pd.read_html(requests.get(url).text, attrs={"class" : "medadimborder"})

print table</code>

How can I get it to display the headers properly and output to a csv or xls file?

If I add the following:

table.to_csv('test.csv')

instead of the print row I get this error:

'list' object has no attribute 'to_csv'

Thanks in Advance!

Okay based on the comments maybe I shouldn't use panda or read_html as I want a table and not a list. I wrote the following code but now the printout has delimiters and looks like I lost the header row. Also still not sure how to export it to csv file.

from bs4 import BeautifulSoup import urllib2 import csv soup = BeautifulSoup(urllib2.urlopen('http://www.dekel.co.il/madad-lazarchan').read(), 'html') data = [] table = soup.find("table", attrs={"class" : "medadimborder"}) table_body = table.find('tbody') rows = table_body.findAll('tr') for row in rows: cols = row.findAll('td') cols = [ele.text.strip() for ele in cols] print cols

[u'01/16', u'130.7915', u'122.4640', u'117.9807', u'112.2557', u'105.8017', u'100.5720', u'98.6'] [u'12/15', u'131.4547', u'123.0850', u'118.5790', u'112.8249', u'106.3383', u'101.0820', u'99.1'] [u'11/15', u'131.5874', u'123.2092', u'118.6986', u'112.9387', u'106.4456', u'101.1840', u'99.2']

3 Answers 3

1

You can use python packages available to work with Excel files. here is a list.

Sign up to request clarification or add additional context in comments.

1 Comment

That link has openpyxl at the top of the list, if you want to easily get your html tables into openpyxl, you can use tablepyxl github.com/martsberger/tablepyxl
1

pandas.read_html returns a list of DataFrames instead of a single one. You have to specify the index of the DataFrame in the returned list (index = 0 in this case):

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv

#now the result of read_html will be named 'tables', which is a list of DataFrames
tables = pd.read_html(requests.get(url).text, attrs={"class" : "medadimborder"})
#assigning the first element of the list of DataFrames 'tables' into DataFrame 'table'
table = tables[0]
#converting into csv
table.to_csv('test.csv') 

Regards

Comments

0

Your "table" variable is not a pandas dataframe but instead a 2D list whose first and only element is the pandas dataframe. Logically, calling pandas methods on a python list won't work and raises an AttributeError. Python's built-in type() and dir() reveal this:

>>> type(table)
<class 'list'>

>>> type(table[0])
<class 'pandas.core.frame.DataFrame'>

# no error 
>>> table[0].to_csv('test.csv')
>>> 

# 2D to 1D list 
>>> table = table[0]
>>> table.to_csv('test.csv')
>>> 

1 Comment

Updated initial question to show new code without panda, as I want a table and not a list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.