Fetch certain .html files from web server

Question

I would like to fetch certain .html files from a web server. My intention is to fetch .html files from a web site (http://www.thetabworld.com/) that has a word "metallica" on file name. How is that possible using python? I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it.

Corey Goldberg · Accepted Answer · 2010-01-19 20:34:53Z

1

"I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it."

well if you don't know how to use urllib2, reading some docs would be a good start.

the following are excellent resources (with examples):

official python docs for urllib2
urllib2 - the missing manual
urllib2 cookbook
PMOTW - urllib2

answered Jan 19, 2010 at 20:34

Corey Goldberg

61.5k30 gold badges135 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Steve McLeod Over a year ago

RTFM is not a very helpful response

Corey Goldberg Over a year ago

steve, my answer gave 4 useful links to the best resources on urrlib2.. and was accepted by the OP. so, i would call it a "helpful response".

Ignacio Vazquez-Abrams · Accepted Answer · 2010-01-19 19:38:56Z

1

You need to use urllib2 together with a HTML parser such as lxml or BeautifulSoup in order to extract the links from the retrieved pages in order to crawl the site.

answered Jan 19, 2010 at 19:38

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Collectives™ on Stack Overflow

Fetch certain .html files from web server

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related