0

I'm very new to Python and Scrapy and trying to output the crawled data to my MySQL database, but I'm running into the following error;

exceptions.AttributeError: 'list' object has no attribute 'encode'

Here's my pipeline code;

import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):
    def __init__(self):
        self.conn = MySQLdb.connect(user='User', passwd='passwd', db='db', host='host', charset="utf8", use_unicode=True)
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):    
        try:
            self.cursor.execute("""INSERT INTO Teams (Country, CountryFlagLink, TeamWikiURL, MethodOfQualification, DateOfQualification, FinalsAppearance, LastAppearance, PreviousBestPerformance, FifaRankingAsOfOct2013)  
                        VALUES (%s, %s)""", 
                       (item['Country'].encode('utf-8'),
                        item['CountryFlagLink'].encode('utf-8'),
                        item['TeamWikiURL'].encode('utf-8'),
                        item['MethodOfQualification'].encode('utf-8'),
                        item['DateOfQualification'].encode('utf-8'),
                        item['FinalsAppearance'].encode('utf-8'),
                        item['LastAppearance'].encode('utf-8'),
                        item['PreviousBestPerformance'].encode('utf-8'),
                        item['FifaRankingAsOfOct2013'].encode('utf-8')))

            self.conn.commit()


        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])

        return item

Here's the full stack trace after I crawl the site and trying to import the data into my MySQL db;

ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] ERROR: Error processing {'Country': [u'Ecuad
r'],
         'CountryFlagLink': [u'//upload.wikimedia.org/wikipedia/commons/thumb/e
e8/Flag_of_Ecuador.svg/23px-Flag_of_Ecuador.svg.png'],
         'DateOfQualification': [u'15 October 2013'],
         'FifaRankingAsOfOct2013': [u'22'],
         'FinalsAppearance': [u'3rd'],
         'LastAppearance': [u'2006'],
         'MethodOfQualification': [u'CONMEBOL Round Robin 4th place'],
         'PreviousBestPerformance': [u'Round of 16 (2006)'],
         'TeamWikiURL': [u'/wiki/Ecuador_national_football_team']}
        Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\mi
dleware.py", line 62, in _process_chain
            return process_chain(self.methods[methodname], obj, *args)
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\ut
ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] ERROR: Error processing {'Country': [u'Hondu
as'],
         'CountryFlagLink': [u'//upload.wikimedia.org/wikipedia/commons/thumb/8
82/Flag_of_Honduras.svg/23px-Flag_of_Honduras.svg.png'],
         'DateOfQualification': [u'15 October 2013'],
         'FifaRankingAsOfOct2013': [u'34'],
         'FinalsAppearance': [u'3rd'],
         'LastAppearance': [u'2010'],
         'MethodOfQualification': [u'CONCACAF Fourth Round 3rd place'],
         'PreviousBestPerformance': [u'Group stage (1982, 2010)'],
         'TeamWikiURL': [u'/wiki/Honduras_national_football_team']}
        Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\mi
dleware.py", line 62, in _process_chain
            return process_chain(self.methods[methodname], obj, *args)
          File "C:\Python27\lib\site-packages\scrapy-0.18.4-py2.7.egg\scrapy\ut
ls\defer.py", line 65, in process_chain
            d.callback(input)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "wikitut\pipelines.py", line 16, in process_item
            (item['Country'].encode('utf-8'),
        exceptions.AttributeError: 'list' object has no attribute 'encode'

2013-11-12 19:36:33-0600 [wikitut] INFO: Closing spider (finished)
2013-11-12 19:36:33-0600 [wikitut] INFO: Dumping Scrapy stats:
        {'downloader/request_bytes': 246,
         'downloader/request_count': 1,
         'downloader/request_method_count/GET': 1,
         'downloader/response_bytes': 72797,
         'downloader/response_count': 1,
         'downloader/response_status_count/200': 1,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2013, 11, 13, 1, 36, 33, 840000),
         'log_count/DEBUG': 7,
         'log_count/ERROR': 22,
         'log_count/INFO': 3,
         'response_received_count': 1,
         'scheduler/dequeued': 1,
         'scheduler/dequeued/memory': 1,

I have a MySQL DB setup with all the required fields (all varchar) and set to collation: utf8_general_ci. I'm at a lost to why I'm getting the error mentioned above. Can some please explain to me what I'm doing wrong?

1 Answer 1

2

according to your error message, It seems to be item['Country'] is list and contains 1 elements in their. see Country': [u'Honduas']

So you need to edit like this:

(item['Country'][0].encode('utf-8'),
item['CountryFlagLink'][0].encode('utf-8'),
item['TeamWikiURL'][0].encode('utf-8'),
item['MethodOfQualification'][0].encode('utf-8'),
item['DateOfQualification'][0].encode('utf-8'),
item['FinalsAppearance'][0].encode('utf-8'),
item['LastAppearance'][0].encode('utf-8'),
item['PreviousBestPerformance'][0].encode('utf-8'),
item['FifaRankingAsOfOct2013'][0].encode('utf-8')))

I'm not Python user, so Maybe I'm wrong.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks it seemed to have worked. Can you tell me what the addition of the [0] means? I've very new to Scrapy and coding and want to learn out of my mistakes rather than just wing it.
scrapy is just framework for python, and [0] is python syntax. if variables is array type (like item['Country']) you can access their first element using [0], and 2nd element [1] respectively. why don't you read this section 6 array. astro.ufl.edu/~warner/prog/python.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.