2

these day im making some web crawler script, but one of problem is my internet is very slow. so i was thought whether is it possible webcrawler with multithreading by use mechanize or urllib or so. if anyone have experience ,share info much appreciate. i was look for in google ,but not found much useful info. Thanks in advance

3 Answers 3

4

There's a good, simple example on this Stack Overflow thread.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 That is a good piece of sample code. I think I'll use that myself!
3

Practical threaded programming with Python is worth reading.

2 Comments

it great resource! :) in addition ,are there any some small script ? function with save result from crawled web page thanks
@paul, I don't know, what I needed for save fetched pages is just for demo purpose, pickle or sqlite or directly dir/file is enough for me.
1

Making multiple requests to many websites at the same time will certainly improve your results, since you don't have to wait for a result to arrive before sending new requests.

However threading is just one of the ways to do that (and a poor one, I might add). Don't use threading for that. Just don't wait for the response before sending another request! No need for threading to do that.

A good idea is to use scrapy. It is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It is written in python and can make many concurrent connections to fetch data at the same time (without using threads to do so). It is really fast. You can also study it to see how it is implemented.

2 Comments

Thanks ! how about compare with mechanize? i mean..compare with speed Thanks in advance
@paul: It will certainly be faster than mechanize. It is easier to do the right thing on it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.