I'm looking for an example to run scrapy script via HTTP request. I'm planing to send url as a parameter that i need to crawl, via GET or POST method. How can i do that.
-
1scrapyRT does exactly that: github.com/scrapinghub/scrapyrtValdir Stumm Junior– Valdir Stumm Junior2018-12-12 13:23:53 +00:00Commented Dec 12, 2018 at 13:23
-
Biswanath is correct, scrapyd would be very useful for youYash Pokar– Yash Pokar2018-12-14 06:51:30 +00:00Commented Dec 14, 2018 at 6:51
Add a comment
|
2 Answers
You should use scrapyd.
Link to the GitHub project page.
Once you are using scrapyd you can use this api to scedule a crawl.
Comments
Try something like that.
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from testspiders.spiders.followall import FollowAllSpider
from scrapy.utils.project import get_project_settings
spider = FollowAllSpider(domain='url.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()