1

Fixed it

For an educational project I am trying to store the scraped data on a MS SQL Database. First of all I'd like every unique item to be placed in products_tb. Once the unique product is inserted, SQL must generate an unique ID for said item, being productgroupid. The products_tb table will only yield the product information which will never change, such as productid, category, name and description. In a second table, which I will create after I get this working, I will store the following data: productgroupid, price, timestamp. The reason for this is that these might change every now and then. With the productgroupid I can always group all the data at any given time and create graphs and so on.

The problem is that I cannot get my pipelines.py to work.. I however did manage to insert data into my SQL database using the commented chunk of code:

#        self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
#                            (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))

It seems to be working with the following code

pipelines.py

import pymssql

class KrcPipeline(object):

    def __init__(self):
        self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):

#        self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
#                            (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))


        sql_statement = f'''
                    BEGIN
                            IF NOT EXISTS (SELECT * FROM [kaercher].[dbo].[products_tb]
                                WHERE productid = {item['productid']})
                            BEGIN
                            INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
                            OUTPUT (Inserted.productgroupid)
                            VALUES ({item['productid']}, '{item['category']}', '{item['name']}', '{item['description']}')
                            END
                            ELSE
                            BEGIN
                                SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
                                WHERE productid = {item['productid']}
                            END
                        END
                '''


        self.cursor.execute(sql_statement)

        self.conn.commit()

        return item

items.py

import scrapy


class KrcItem(scrapy.Item):
    productid=scrapy.Field()
    name=scrapy.Field()
    description=scrapy.Field()
    price=scrapy.Field()
    producttype=scrapy.Field()
    timestamp=scrapy.Field()
    category=scrapy.Field()
    pass

1 Answer 1

1

EDIT:

Another small error I missed. "IF NOT EXIST" needs to changed to "IF NOT EXISTS".

import pymssql

class KrcPipeline(object):

    def __init__(self):
        self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):

#        self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
#                            (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))


        sql_statement = f'''
                    BEGIN
                            IF NOT EXISTS (SELECT * FROM [kaercher].[dbo].[products_tb]
                                WHERE productid = {item['productid']})
                            BEGIN
                            INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
                            OUTPUT (Inserted.productgroupid)
                            VALUES ({item['productid']}, {item['category']}, {item['name']}, {item['description']})
                            END
                            ELSE
                            BEGIN
                                SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
                                WHERE productid = {item['productid']}
                            END
                        END
                '''


        self.cursor.execute(sql_statement)

        self.conn.commit()

        return item

ORIGINAL:

You're not calling the values in the item dictionary correctly when defining sql_statement. Try this:

import pymssql

class KrcPipeline(object):

    def __init__(self):
        self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):

#        self.cursor.execute("INSERT INTO products_tb(productid, category, name, description, price, timestamp) VALUES (%s, %s, %s, %s, %s, %s)",
#                            (item['productid'], item['category'], item['name'], item['description'], item['price'], item['timestamp']))


        sql_statement = f'''
                    BEGIN
                            IF NOT EXIST (SELECT * FROM [kaercher].[dbo].[products_tb]
                                WHERE productid = {item['productid']})
                            BEGIN
                            INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
                            OUTPUT (Inserted.productgroupid)
                            VALUES ({item['productid']}, {item['category']}, {item['name']}, {item['description']})
                            END
                            ELSE
                            BEGIN
                                SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
                                WHERE productid = {item['productid']}
                            END
                        END
                '''


        self.cursor.execute(sql_statement)

        self.conn.commit()

        return item
Sign up to request clarification or add additional context in comments.

6 Comments

Hello @SamLegesse, thank you for your solution, I really appreciate this, but I am still having the exact same error using the code you provided.
@Bamieschijf I noticed a typo near where you were getting the operational error. See the edited answer.
Hello @SamLegesse, once again I would like to thank you for your feedback. Progress has been made, but now I get a different syntax error.. I imported the code in several MySQL Syntax checkers, they all keep saying something is wrong with the first part of the query, being: IF NOT EXISTS (SELECT * FROM [kaercher].[dbo].[products_tb] WHERE productid = {item['productid']}). I edited my code etc in the question (see above)
@Bamieschijf you're using MySQL syntax checkers for a SQL Server query? Based on the new error you're getting now it seems the issue is near productid. Is productid of type varchar or int? If it's varchar you'll need to surround it in quotes.
Hello @SamLegesse I ment SQL Syntax Checker, that was a typo by me. I made a couple of changes and the data is now being stored on my database. I don't know if it is the best solution, but the data is being stored on my database so far. I would like to thank you for your help! :-)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.