web scraping and transferring data into excel using python

Question

im able to fully scrap the material i needed the problem is i cant get the data into excel.

from lxml import html
import requests
import xlsxwriter

page = requests.get('website that gets mined')
tree = html.fromstring(page.content)




items = tree.xpath('//h4[@class="item-title"]/text()')
prices = tree.xpath('//span[@class="price"]/text()')
description = tree.xpath('//div[@class="description text"]/text()')
print 'items: ', items
print 'Prices: ', prices
print 'description', description

everything works fine until this section where i try to get the data into excel this is the error message:

for items,prices,description in (array):
ValueError: too many values to unpack
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x104735e10>> ignored

this is what it was trying to do

array = [items,prices,description]
workbook   = xlsxwriter.Workbook('test1.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0

for items,prices,description in (array):
    worksheet.write(row, col, items)
    worksheet.write(row, col + 1, prices)
    worksheet.write(row, col + 2, description)
    row += 1
workbook.close()

Why are you trying to unpack the returned values to write individually? It looks like the library comes with a write_row method. Your error is telling you that you have more than 3 values which can't be unpacked into items,prices,description — roganjosh
– roganjosh, Commented Feb 19, 2018 at 23:02
so using writerow from the csv lib its putting all the data in 1 row. i need it to be going as one column. — user3642695
– user3642695, Commented Feb 19, 2018 at 23:16
You mean one row right? You don't increment your row counter until all 3 values are written, suggesting you want a single row — roganjosh
– roganjosh, Commented Feb 19, 2018 at 23:18
This puts all the data into excel but the data is written horizontally with the first item in items in A then second item in B etc " with open('test1.csv', 'wb') as csvfile: spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL) spamwriter.writerow([items]) spamwriter.writerow([prices]) spamwriter.writerow([description])" — user3642695
– user3642695, Commented Feb 19, 2018 at 23:22
Then I'm lost on your question because that appears to be exactly what your "this is what it was trying to do" code would do. — roganjosh
– roganjosh, Commented Feb 19, 2018 at 23:24

Lohmar ASHAR · Accepted Answer · 2018-02-19 23:08:52Z

2

Assuming that "items,prices,description" all have the same length, you could rewrite the final part of the code in :

for item,price,desc in zip(items,prices,description)
    worksheet.write(row, col, item)
    worksheet.write(row, col + 1, price)
    worksheet.write(row, col + 2, desc)
    row += 1

If the lists can have unequal lengths you should check this for alternatives for the zip method, but I would be worried for the data consistency.

answered Feb 19, 2018 at 23:08

Lohmar ASHAR

1,77114 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3642695 Over a year ago

this method works but its not giving me the full list of results it gets to 61 instances and stops there

Lohmar ASHAR Over a year ago

Check (print) the lengths of all the arrays before the for loop.

ASH · Accepted Answer · 2018-02-21 03:40:37Z

Inevitably, it will be easier to write to a CSV file, or a Text file, rather than an Excel file.

import urllib2

listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]

urls = []

for company in listOfStocks:
    urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')

Output_File = open('C:/your_path_here/Data.csv','w')

New_Format_Data = ''

for counter in range(0, len(urls)):

    Original_Data = urllib2.urlopen(urls[counter]).read()

    if counter == 0:
        New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()

    rows = Original_Data.splitlines(1)

    for row in range(1, len(rows)):

        New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]

Output_File.write(New_Format_Data)
Output_File.close()

OR

from bs4 import BeautifulSoup
import urllib2

var_file = urllib2.urlopen("http://www.imdb.com/chart/top")

var_html  = var_file.read()

text_file = open("C:/your_path_here/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        #print(link)
        z = str(link)
        text_file.write(z + "\r\n")
text_file.close()

As a developer, it's difficult to programmatically manipulate Excel files since the Excel is proprietary. This is especially true for languages other than .NET. On the other hand, for a developer it's easy to programmatically manipulate CSV since, after all, they are simple text files.

Collectives™ on Stack Overflow

web scraping and transferring data into excel using python

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related