3

Creating a web-scraper in Python 2.6.4 + Scrapy toolkit. Need to do data analysis, but also my first Python learning project. Having trouble creating the SQL INSERT statement in my pipeline.py. The real query has approximately 30 attributes to insert..

First, is there a better way to write this UPDATE or INSERT algorithm? Open to improvements.

Second, here's two different syntax variations and the different errors they produce. I've tried lots of variations based on examples but can't find an example using "INSERT SET" breaking across multiple lines. What is the proper syntax?

DB is empty so we're always branching to 'INSERT' block for now.

def _conditional_insert(self, tx, item):
 # create record if doesn't exist.
 tx.execute("SELECT username  FROM profiles_flat WHERE username = %s", (item['username'][0], ))
 result = tx.fetchone()

 if result:
    # do row UPDATE
     tx.execute( \
        """UPDATE profiles_flat SET
        username=`%s`, 
        headline=`%s`,
        age=`%s`
        WHERE username=`%s`""", (  \
        item['username'],
        item['headline'],
        item['age'],)
        item['username'],)
     )         
 else: 
   # do row INSERT
   tx.execute( \
   """INSERT INTO profiles_flat SET
        username=`%s`, 
        headline=`%s`,
        age=`%s` """, ( \
        item['username'],
        item['headline'],
        item['age'], )   # line 222
   )

Error:

[Failure instance: Traceback: <class '_mysql_exceptions.OperationalError'>: (1054, "Unknown column ''missLovely92 '' in 'field list'")
  /usr/lib/python2.6/threading.py:497:__bootstrap
  /usr/lib/python2.6/threading.py:525:__bootstrap_inner
  /usr/lib/python2.6/threading.py:477:run
  --- <exception caught here> ---
  /usr/lib/python2.6/vendor-packages/twisted/python/threadpool.py:210:_worker
  /usr/lib/python2.6/vendor-packages/twisted/python/context.py:59:callWithContext
  /usr/lib/python2.6/vendor-packages/twisted/python/context.py:37:callWithContext
  /usr/lib/python2.6/vendor-packages/twisted/enterprise/adbapi.py:429:_runInteraction
  /export/home/raven/scrapy/project/project/pipelines.py:222:_conditional_insert
  /usr/lib/python2.6/vendor-packages/MySQLdb/cursors.py:166:execute
  /usr/lib/python2.6/vendor-packages/MySQLdb/connections.py:35:defaulterrorhandler
  ]

Alternative Syntax:

  query = """INSERT INTO profiles_flat SET
        username=`%s`, 
        headline=`%s`,
        age=`%s` """ % \
   item['username'], # line 196
   item['headline'],
   item['age']

   tx.execute(query)

Error:

  [Failure instance: Traceback: <type 'exceptions.TypeError'>: not enough arguments for format string
  /usr/lib/python2.6/threading.py:497:__bootstrap
  /usr/lib/python2.6/threading.py:525:__bootstrap_inner
  /usr/lib/python2.6/threading.py:477:run
  --- <exception caught here> ---
  /usr/lib/python2.6/vendor-packages/twisted/python/threadpool.py:210:_worker
  /usr/lib/python2.6/vendor-packages/twisted/python/context.py:59:callWithContext
  /usr/lib/python2.6/vendor-packages/twisted/python/context.py:37:callWithContext
  /usr/lib/python2.6/vendor-packages/twisted/enterprise/adbapi.py:429:_runInteraction
  /export/home/raven/scrapy/project/project/pipelines.py:196:_conditional_insert
  ]    

1 Answer 1

2

You shouldn't surround values with backticks. Backticks are used to quote column names.

INSERT INTO profiles_flat (username, headline, age)
VALUES (%s, %s, %s)
Sign up to request clarification or add additional context in comments.

4 Comments

In fact, you shouldn't be surrounding them with any kind of quoting at all!
Modifying the first syntax I posted to remove the backticks I get the following error: [Failure instance: Traceback: <class '_mysql_exceptions.ProgrammingError'>: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'),\n ... also I chose the 'INSERT INTO .. SET' vs the 'INSERT INTO .. VALUES' syntax because of the queries similarities to the UPDATE statement.. Maybe this can't be done in python?
@Garrick: If you are having further problems, please post your new code. I guess the problem is that you are using string interpolation instead of parameterized queries. You should use tx.execute(query, params) not tx.execute(query % params).
Thanks for the help! I got dragged away from this for a few days but will post back when I finish it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.