0

I am trying to do the following:

  1. A spider scrapes the links present on the webpage of a website.
  2. It saves the links in a text file.
  3. Another spider now opens the text file and reads the links and scrapes the individual web pages and saves the data.

I am trying to call these spiders from another python script which resides in a different directory. Now the first spider is being called correctly without any issues. The problem is with the second spider.

The source code of the second spider is as follows:

import scrapy
from dateutil.parser import parse
import requests
from scrapy.http import Request
from project-name.items import Project-nameItem

url_list = []
with open("file.txt", "r") as f:
    for line in f:
        url_list.append(line)
for i in range(0, len(url_list)):
    url_list[i] = url_list[i].replace('\n','')
indexList = [] 
URL = "http://www.exaple.com/id=%s"
number = 0

class AnotherSpider(scrapy.Spider):
    name = "another"

    allowed_domains = ['example.com']

    start_urls = [URL % number]

    def start_requests(self):
        for i in url_list:
            yield Request(url = URL % i, callback = self.parse)

    def parse(self, response):
        #scrape the page for the required information

When I call the second spider, the error that I get is:

runspider: error: Unable to load '/home/project-name/project-name/spiders/anotherspider.py': No module named project-name.items

EDIT

Since the python script is in a different directory, I am using the runspider command to execute the spiders. The problem with this command is that it is a global level command, which means, the project settings are not considered. This is most probably leading to the python script not being able to identify the items.py file

The command used to execute the spiders is as follows:

scrapy runspider spider1.py

scrapy runspider spider2.py

Is there a work around?

12
  • please show source of second spider Commented Jan 19, 2015 at 8:19
  • what is the procedure used to invoke the second spider ? Commented Jan 19, 2015 at 8:22
  • @aberna simply calling scrapy runspider spider-name from the script. I am using runspider since I am using another script to call the spider programs, And for that I need global commands. But then, since the command is global, I cannot add project specific files. Is there an work around? Commented Jan 19, 2015 at 8:24
  • if the spiders are within the same project have you tried using the command "crawl" instead of "runspider" ? Commented Jan 19, 2015 at 8:28
  • @aberna I edited my previous comment to throw some more light on the error. Hope it helps. Commented Jan 19, 2015 at 8:30

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.