1

I am using Scrapy 0.20 with Python 2.7.

I used to do this in cmd,

 -s JOBDIR=crawls/somespider-1

to handle the dublicated items. Note please, I already did the changes in setting

I dont' want to use that in cmd.

Is there a way I can type it in code inside my spider?

1
  • Edit the topic such that the new one is more relevent.. Commented Mar 3, 2014 at 10:18

1 Answer 1

2

It's so easy. Use dropitem in pipelines.py to drop the item. And you can use custom command to code the parameter inside of program.

Here is example of custom code in scrapy

Using the custom command (say : scrapy crawl mycommand)

you can run -s JOBDIR=crawls/somespider-1

Example:

Create a directory commands where you have scrapy.cfg file Inside the directory create a file mycommand.py

from scrapy.command import ScrapyCommand
from scrapy.cmdline import execute



class Command(ScrapyCommand):
    requires_project = True

    def short_desc(self):
        return "This is your custom command"


    def run(self, args, opts):
        args.append('scrapy')
        args.append('crawl')
        args.append('spider')##add what ever your syntax needs.In my case i want to get "scrapy crawl spider" in cmd
        execute(args)#send a list as parameter with command as a single element of it

Now go to cmd line and type scrapy mycommand. Then your magic is ready :-)

Sign up to request clarification or add additional context in comments.

5 Comments

could you clarify more
i will give you a sample then
and i will explain you it. just try to figure out how it works. and if you have problem, you can ask again
I understood that i can do this self.jobdir = 'something, right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.