1

im able to fully scrap the material i needed the problem is i cant get the data into excel.

from lxml import html
import requests
import xlsxwriter

page = requests.get('website that gets mined')
tree = html.fromstring(page.content)




items = tree.xpath('//h4[@class="item-title"]/text()')
prices = tree.xpath('//span[@class="price"]/text()')
description = tree.xpath('//div[@class="description text"]/text()')
print 'items: ', items
print 'Prices: ', prices
print 'description', description

everything works fine until this section where i try to get the data into excel this is the error message:

for items,prices,description in (array):
ValueError: too many values to unpack
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x104735e10>> ignored

this is what it was trying to do

array = [items,prices,description]
workbook   = xlsxwriter.Workbook('test1.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0

for items,prices,description in (array):
    worksheet.write(row, col, items)
    worksheet.write(row, col + 1, prices)
    worksheet.write(row, col + 2, description)
    row += 1
workbook.close()
5
  • Why are you trying to unpack the returned values to write individually? It looks like the library comes with a write_row method. Your error is telling you that you have more than 3 values which can't be unpacked into items,prices,description Commented Feb 19, 2018 at 23:02
  • so using writerow from the csv lib its putting all the data in 1 row. i need it to be going as one column. Commented Feb 19, 2018 at 23:16
  • You mean one row right? You don't increment your row counter until all 3 values are written, suggesting you want a single row Commented Feb 19, 2018 at 23:18
  • This puts all the data into excel but the data is written horizontally with the first item in items in A then second item in B etc " with open('test1.csv', 'wb') as csvfile: spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL) spamwriter.writerow([items]) spamwriter.writerow([prices]) spamwriter.writerow([description])" Commented Feb 19, 2018 at 23:22
  • Then I'm lost on your question because that appears to be exactly what your "this is what it was trying to do" code would do. Commented Feb 19, 2018 at 23:24

2 Answers 2

2

Assuming that "items,prices,description" all have the same length, you could rewrite the final part of the code in :

for item,price,desc in zip(items,prices,description)
    worksheet.write(row, col, item)
    worksheet.write(row, col + 1, price)
    worksheet.write(row, col + 2, desc)
    row += 1

If the lists can have unequal lengths you should check this for alternatives for the zip method, but I would be worried for the data consistency.

Sign up to request clarification or add additional context in comments.

2 Comments

this method works but its not giving me the full list of results it gets to 61 instances and stops there
Check (print) the lengths of all the arrays before the for loop.
0

Inevitably, it will be easier to write to a CSV file, or a Text file, rather than an Excel file.

import urllib2

listOfStocks = ["AAPL", "MSFT", "GOOG", "FB", "AMZN"]

urls = []

for company in listOfStocks:
    urls.append('http://real-chart.finance.yahoo.com/table.csv?s=' + company + '&d=6&e=28&f=2015&g=m&a=11&b=12&c=1980&ignore=.csv')

Output_File = open('C:/your_path_here/Data.csv','w')

New_Format_Data = ''

for counter in range(0, len(urls)):

    Original_Data = urllib2.urlopen(urls[counter]).read()

    if counter == 0:
        New_Format_Data = "Company," + urllib2.urlopen(urls[counter]).readline()

    rows = Original_Data.splitlines(1)

    for row in range(1, len(rows)):

        New_Format_Data = New_Format_Data + listOfStocks[counter] + ',' + rows[row]

Output_File.write(New_Format_Data)
Output_File.close()

OR

from bs4 import BeautifulSoup
import urllib2

var_file = urllib2.urlopen("http://www.imdb.com/chart/top")

var_html  = var_file.read()

text_file = open("C:/your_path_here/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
    for link in item.find_all('a'):
        #print(link)
        z = str(link)
        text_file.write(z + "\r\n")
text_file.close()

As a developer, it's difficult to programmatically manipulate Excel files since the Excel is proprietary. This is especially true for languages other than .NET. On the other hand, for a developer it's easy to programmatically manipulate CSV since, after all, they are simple text files.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.