0

I have tried hours to get around this but still cannot make it to work properly. I am using scrapy to scrape data from a website and then trying to insert that into MySQL database. Here is my database code:

import MySQLdb


class Database:

host = 'localhost'
user = 'root'
password = 'test123'
db = 'scraping_db'

def __init__(self):
    self.connection = MySQLdb.connect(self.host, self.user, self.password, self.db,use_unicode=True, charset="utf8")
    self.cursor = self.connection.cursor()

def insert(self, query,params):
    try:
        self.cursor.execute(query,params)
        self.connection.commit()
    except Exception as ex:
        self.connection.rollback()


def __del__(self):
    self.connection.close()

Here is my pipeline code where I am making insert query and passing to the above class' insert method:

from con import Database


class LinkPipeline(object):

    def __init__(self):
        self.db=Database()

    def process_item(self, item, spider):
        query="""INSERT INTO links (title, location,company_name,posted_date,status,company_id,scraped_link,content,detail_link,job_id) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s,%s)"""
        params=(item['title'], item['location'], item['company_name'], item['posted_date'], item['status'], item['company_id'], item['scraped_link'], item['content'], item['detail_link'],item['job_id'])
        self.db.insert(query,params)
        return item

This totally works fine on my local machine. But on server I get following error:

1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \')

When I print the params and query from exception block I have this:

query variable:

INSERT INTO links (title, location,company_name,posted_date,status,company_id,scraped_link,content,detail_link,job_id) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s,%s)

params variable:

((u'Account Leader, Sr',), (u'Sydney',), (u'\n    Halliburton',), (datetime.datetime(2018, 4, 9, 21, 55, 46, 789575),), ('Pending',), ([u'0e4554ac6dcff427'],), (u'https://www.site.com.au/rc/clk?jk=3f41218887882940&fccid=0e4554ac6dcff427&vjs=3',), 'Job Content', 'https://jobs.halliburton.com/job/Account-Leader%2C-Sr-IS/437741300/?feedId=162400', ([u'3f41218887882940'],))

I feel the the tuple data is the culprit of MySQL string breaking somewhere due to quotes. But I am very new to Python not sure I checked in another question on SO to follow this syntax to insert into MySQL database i.e:

self.db.insert(query,params)

The above code works fine on my local machine but fails on server. Please guide me in right direction. Thank you very much!

1
  • Well it doesn't like the format of your newline '\n Halliburton'. This is what your issue is. Perhaps this might be of assistance.. Unsure but maybe python has a way to convert newlines to page breaks. Commented Apr 9, 2018 at 17:58

1 Answer 1

1

It very much looks like the tuple encapsulation is your issue. What is the output of:

print( repr( item['location'] ))

That's "print the (coder's) representation of item['location']" (rather than trying to be smart about printing.

>>> print( repr( item['location'] ))
('Sydney',)     # A tuple, 1-long, containing a string

>>> print( repr( item['location'] ))
'Sydney'        # A string

If it's the first, then your passed data structure inside of item apparently has an extra layer of encapsulation for which your code does not account. The quick and dirty approach to get you up and running:

def process_item(self, item, spider):
    query="""INSERT INTO links (title, location,company_name,posted_date,status,company_id,scraped_link,content,detail_link,job_id) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s,%s)"""
    params=(item['title'][0], item['location'][0], ...
    self.db.insert(query,params)
    return item

Note that this is hardly a robust solution, API-wise: what happens if one of those embedded tuples is zero length? (Hint: Exception). I've also not filled out the rest, because it looks like you have some elements in item that are not encapsulated at all, and others which are doubly encapsulated.

Additionally, you may have some encoding errors with your data after this as some of your elements are unicode and others are not. For example:

(u'Sydney',)  ...    ('Pending',)

You may want to check exactly what your schema requires.

Sign up to request clarification or add additional context in comments.

1 Comment

thank you very much for you answer. You have really guided me in right direction with your insight. So it means that whole of my 'item' data is inconsistent format that needs to be fixed and made consistent for mysql insertion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.