0

I would like to fetch certain .html files from a web server. My intention is to fetch .html files from a web site (http://www.thetabworld.com/) that has a word "metallica" on file name. How is that possible using python? I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it.

2 Answers 2

1

"I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it."

well if you don't know how to use urllib2, reading some docs would be a good start.

the following are excellent resources (with examples):

official python docs for urllib2
urllib2 - the missing manual
urllib2 cookbook
PMOTW - urllib2

Sign up to request clarification or add additional context in comments.

2 Comments

RTFM is not a very helpful response
steve, my answer gave 4 useful links to the best resources on urrlib2.. and was accepted by the OP. so, i would call it a "helpful response".
1

You need to use urllib2 together with a HTML parser such as lxml or BeautifulSoup in order to extract the links from the retrieved pages in order to crawl the site.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.