0

The problem i am facing is that Scrapy code, specifically pipeline presents me with a Programming error mysql.connector.errors.ProgrammingError: Not all parameters were used in the SQL statement'

This is my code for the pipeline:

import csv
from scrapy.exceptions import DropItem
from scrapy import log
import sys
import mysql.connector

class CsvWriterPipeline(object):

    def __init__(self):
        self.connection = mysql.connector.connect(host='localhost', user='test', password='test', db='test')
        self.cursor = self.connection.cursor()

    def process_item(self, item, spider):
        self.cursor.execute("SELECT title, url FROM items WHERE title= %s", item['title'])
        result = self.cursor.fetchone()
        if result:

            log.msg("Item already in database: %s" % item, level=log.DEBUG)
        else:
            self.cursor.execute(
               "INSERT INTO items (title, url) VALUES (%s, %s)",
                    (item['title'][0], item['link'][0]))
            self.connection.commit()

            log.msg("Item stored : " % item, level=log.DEBUG)
        return item

    def handle_error(self, e):
            log.err(e)

It gives me this exact error when i run the spider. http://hastebin.com/xakotugaha.py

As u can see, it clearly crawls so i doubt anything wrong with the spider.

I am currently using Scrapy web crawler with MySql database. Thanks for your help.

1 Answer 1

1

The error is happening while you are making a SELECT query. There is a single placeholder in the query, but item['title'] is a list of strings - it has multiple values:

self.cursor.execute("SELECT title, url FROM items WHERE title= %s", item['title'])

The root problem is actually coming from the spider. Instead of having a single item being returned with multiple links and titles - you need to return a separate item for every link and title.


Here is the code of the spider that should work for you:

import scrapy

from scrapycrawler.items import DmozItem


class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["snipplr.com"]

    def start_requests(self):
        for i in range(1, 146):
            yield self.make_requests_from_url("https://snipt.net/public/?page=%d" % i)

    def parse(self, response):
        for sel in response.xpath('//article/div[2]/div/header/h1/a'):
            item = DmozItem()
            item['title'] = sel.xpath('text()').extract()
            item['link'] = sel.xpath('@href').extract()
            yield item
Sign up to request clarification or add additional context in comments.

3 Comments

I see, how would want go about doing this ? it worked before but for some reason stopped working. This is the spider code if u need it: hastebin.com/yalivovifo.py
ah, does for sel in response.xpath('//article/div[2]/div/header/h1/a'): have any diffrence compared to this ? sel = Selector(response)
@CharlieC it's just that you need to yield item instances from inside the loop, for every link. Hope that makes things work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.