I am trying to do the following:
- A spider scrapes the links present on the webpage of a website.
- It saves the links in a text file.
- Another spider now opens the text file and reads the links and scrapes the individual web pages and saves the data.
I am trying to call these spiders from another python script which resides in a different directory. Now the first spider is being called correctly without any issues.
The problem is with the second spider.
The source code of the second spider is as follows:
import scrapy
from dateutil.parser import parse
import requests
from scrapy.http import Request
from project-name.items import Project-nameItem
url_list = []
with open("file.txt", "r") as f:
for line in f:
url_list.append(line)
for i in range(0, len(url_list)):
url_list[i] = url_list[i].replace('\n','')
indexList = []
URL = "http://www.exaple.com/id=%s"
number = 0
class AnotherSpider(scrapy.Spider):
name = "another"
allowed_domains = ['example.com']
start_urls = [URL % number]
def start_requests(self):
for i in url_list:
yield Request(url = URL % i, callback = self.parse)
def parse(self, response):
#scrape the page for the required information
When I call the second spider, the error that I get is:
runspider: error: Unable to load '/home/project-name/project-name/spiders/anotherspider.py': No module named project-name.items
EDIT
Since the python script is in a different directory, I am using the runspider command to execute the spiders. The problem with this command is that it is a global level command, which means, the project settings are not considered. This is most probably leading to the python script not being able to identify the items.py file
The command used to execute the spiders is as follows:
scrapy runspider spider1.py
scrapy runspider spider2.py
Is there a work around?
scrapy runspider spider-namefrom the script. I am using runspider since I am using another script to call the spider programs, And for that I need global commands. But then, since the command is global, I cannot add project specific files. Is there an work around?