1

I've modified a basic web-crawler to gather up a list of links to a site, which is likely to run into the thousands.The problem I'm having is that the script is timing out once I try and run it through a browser on top of this its been mentioned in a previous question I asked that there also may be an issue with the script running to many processes at the same time killing the server I run it on.

How would I got about fixing these issues or should I go with a open source crawler and if so which crawler should I go with as I can't find anything specific enough,as phpDig site is down :/

previous question

1
  • No matter which script you are going to use, the only realistic way is to put the crawler into the background (with a cron job). Commented Apr 13, 2011 at 11:48

1 Answer 1

1

Processes like this are best run as PHP CLI cron jobs.

If you need to be able to run it on demand from a web interface then consider adding it to a queue to be run in the background using Gearman or even the unix at command.

It so happens that I have written a PHP wrapping class for the linux at job queue, which is available from my github account should you choose to go down that route.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.