I have here a python web scraping tool script, I need to validate the url if its an existing website by testing connectivity to the website. Can anyone help me to implement this in my code?
Here's my code:
import sys, urllib
while True:
try:
url= raw_input('Please input address: ')
webpage=urllib.urlopen(url)
print 'Web address is valid'
break
except:
print 'No input or wrong url format usage: http://wwww.domainname.com/ '
print 'Please try again'
def wget(webpage):
print '[*] Fetching webpage...\n'
page = webpage.read()
return page
def main():
sys.argv.append(webpage)
if len(sys.argv) != 2:
print '[-] Usage: webpage_get URL'
return
print wget(sys.argv[1])
if __name__ == '__main__':
main()
EDIT: I have a code here that I extracted from another stackoverflow post. This code works and I just want it to integrate to my code. I have tried to integrate myself but get errors instead. Can anyone help me do this? Here's the code:
from urllib2 import Request, urlopen, URLError
req = Request('http://jfvbhsjdfvbs.com')
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
else:
print 'URL is good!'
while Trueis executed before you call main.whilewith its suite, then twodefs(adding the functions to the scope) and then the condition which maybe invokesmain. By this order, yourwhileis executed first and yourmainlast in case the condition holds.