1

I couldn't find any answer to my problem so I hope it will be ok to ask here.

I am trying to scrap cinema shows and still getting following error.

enter image description here

What is really confusing for me that the problem apparently lies in pipelines. However, I have second spider for opera house with the exact same code(only place is different) and it works just fine."Shows" and "Place" refers to my Django models. I've changed their fields to be CharFields so it's not a problem with wrong date/time format.

I also tried to use dedicated scrapy item "KikaItem" instead of "ShowItem" (which is shared with my opera spider) but the error still remains.

class ScrapyKika(object):
    def process_item(self, ShowItem, spider):
        place, created = Place.objects.get_or_create(name="kino kika")

        show = Shows.objects.update_or_create(
            time=ShowItem["time"],
            date=ShowItem["date"],
            place=place,
            defaults={'title': ShowItem["title"]}
        )

        return ShowItem

Here is my spider code.I expect the problem is somewhere here, because I used a different approach here than in the opera one. However,I am not sure what can be wrong.

import scrapy
from ..items import ShowItem, KikaItemLoader

class KikaSpider(scrapy.Spider):
    name = "kika"
    allowed_domains = ["http://www.kinokika.pl/dk.php"]
    start_urls = [
        "http://www.kinokika.pl/dk.php"


    ]
    def parse(self, response):
        divs = response.xpath('//b')
        for div in divs:
            l = KikaItemLoader(item=ShowItem(), response=response)
            l.add_xpath("title", "./text()")
            l.add_xpath("date", "./ancestor::ul[1]/preceding-sibling::h2[1]/text()")
            l.add_xpath("time", "./preceding-sibling::small[1]/text()")
            return l.load_item()

ItemLoader

class KikaItemLoader(ItemLoader):
    title_in = MapCompose(strip_string,lowercase)
    title_out = Join()

    time_in = MapCompose(strip_string)
    time_out = Join()

    date_in = MapCompose(strip_string)
    date_out = Join()

Thank you for your time and sorry for any misspellings :)

1 Answer 1

2

Currently, your spider yields a single item:

{'title': u'  '}

which does not have the date and time fields filled out. This is because of the way you initialize the ItemLoader class in your spider.

You should be initializing your item loader with a specific selector in mind. Replace:

for div in divs:
    l = KikaItemLoader(item=ShowItem(), response=response)

with:

for div in divs:
    l = KikaItemLoader(item=ShowItem(), selector=div)
Sign up to request clarification or add additional context in comments.

3 Comments

I can't add screenshot to comment but It says now "text = response.text AttributeError: text"
@Grevioos I've tested the code before posting the answer, it works for me perfectly and I am pretty sure this is the solution. Please recheck the suggested code again. Thanks.
I've had to drop my database and recreate the spider but now it works fine with your solution.Thanks:)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.