How to call particular Scrapy spiders from another Python script

Question

I have a script called algorithm.py and I want to be able to call Scrapy spiders during the script. The file scructure is:

algorithm.py MySpiders/

where MySpiders is a folder containing several scrapy projects. I would like to create methods perform_spider1(), perform_spider2()... which I can call in algorithm.py.

How do I construct this method?

I have managed to call one spider using the following code, however, it's not a method and it only works for one spider. I'm a beginner in need of help!

import sys,os.path
sys.path.append('path to spider1/spider1')
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log, signals
from scrapy.xlib.pydispatch import dispatcher
from spider1.spiders.spider1_spider import Spider1Spider

def stop_reactor():
    reactor.stop()

dispatcher.connect(stop_reactor, signal=signals.spider_closed)

spider = RaListSpider()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
log.msg('Running reactor...')
reactor.run() # the script will block here
log.msg('Reactor stopped.')

Community · Accepted Answer · 2017-05-23 12:08:57Z

5

Just go through your spiders and set them up via calling configure, crawl and start, and only then call log.start() and reactor.run(). And scrapy will run multiple spiders in the same process.

For more info see documentation and this thread.

Also, consider running your spiders via scrapyd.

Hope that helps.

edited May 23, 2017 at 12:08

CommunityBot

11 silver badge

answered Jun 8, 2013 at 11:04

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

David Bailey Over a year ago

Thanks, alecxe! How can I stop the reactor after the last spider? Currently I am using def stop_reactor(): reactor.stop() dispatcher.connect(stop_reactor, signal=signals.spider_closed) However, this stops after the first spider...

alecxe Over a year ago

You are welcome. Good question! How about keeping track of spiders been closed in the stop_reactor manually and stop the reactor if all were closed? Btw, I've edited the answer and included the link to a relevant thread.

David Bailey Over a year ago

Thanks, mate. I don't have enough reputation to up-vote you but I morally up-vote you instead :)

praxmon Over a year ago

What do I do if the spiders are in a completely different directory? Will this method still work?

alecxe Over a year ago

@PrakharMohanSrivastava as long as the spiders are importable - it should work too.

David Bailey · Accepted Answer · 2013-06-08 20:52:14Z

Based on the good advice from alecxe, here is a possible solution.

import sys,os.path
sys.path.append('/path/ra_list/')
sys.path.append('/path/ra_event/')
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log, signals
from scrapy.xlib.pydispatch import dispatcher
from ra_list.spiders.ra_list_spider import RaListSpider
from ra_event.spiders.ra_event_spider import RaEventSpider

spider_count = 0
number_of_spiders = 2

def stop_reactor_after_all_spiders():
    global spider_count
    spider_count = spider_count + 1
    if spider_count == number_of_spiders:
        reactor.stop()


dispatcher.connect(stop_reactor_after_all_spiders, signal=signals.spider_closed)

def crawl_resident_advisor():

    global spider_count
    spider_count = 0

    crawler = Crawler(Settings())
    crawler.configure()
    crawler.crawl(RaListSpider())
    crawler.start()

    crawler = Crawler(Settings())
    crawler.configure()
    crawler.crawl(RaEventSpider())
    crawler.start()

    log.start()
    log.msg('Running in reactor...')
    reactor.run() # the script will block here
    log.msg('Reactor stopped.')

Collectives™ on Stack Overflow

How to call particular Scrapy spiders from another Python script

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related