0

I have a spider that I want to output its results to standard output so that it can be read by subprocess.check_output. I don't want to output to a file as an intermediary.

I've tried adding the flag '-o', 'stdout' but it doesn't work.

test = subprocess.check_output([
        'scrapy', 'runspider', 'spider.py',
        '-a', f"keywords={keywords}", '-a', f'domain={domain}', '-a', f'page={1}',
        '-s', 'USER_AGENT=Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
    ])
5
  • What does your test variable contain? Please explain why it doesn't work. Commented Feb 8, 2019 at 1:41
  • Oh well it's just an empty string because it's not working but I want it to contain the output of my spider. It works if I do -o test.json at the end of that command, but that puts it into a file, which I don't want. Commented Feb 8, 2019 at 3:27
  • This answer should help you: stackoverflow.com/a/13332300 Commented Feb 8, 2019 at 5:05
  • You want scrapy to export json values to stdout? Commented Feb 8, 2019 at 5:26
  • yea! as a string of course, but i want to consume it through the subprocess check_output method Commented Feb 8, 2019 at 6:35

1 Answer 1

1

Try this: Main .py

from subprocess import Popen, PIPE

command = ["scrapy runspider yourspider.py -a some additional commands"]
proc = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)
proc.wait()
res = proc.communicate()
if proc.returncode:
    print(res[1])
print('result:', res[0])

Sub yourspider.py

import sys

# your code

print(something what you need to transfer)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.