0

I have here a python web scraping tool script, I need to validate the url if its an existing website by testing connectivity to the website. Can anyone help me to implement this in my code?

Here's my code:

import sys, urllib

while True:
    try:
        url= raw_input('Please input address: ')
        webpage=urllib.urlopen(url)
        print 'Web address is valid'
        break
    except:
        print 'No input or wrong url format usage: http://wwww.domainname.com/ '
        print 'Please try again'
def wget(webpage):
        print '[*] Fetching webpage...\n'
        page = webpage.read()
        return page      
def main():
    sys.argv.append(webpage)
    if len(sys.argv) != 2:
        print '[-] Usage: webpage_get URL'
        return
    print wget(sys.argv[1])

if __name__ == '__main__':
    main()

EDIT: I have a code here that I extracted from another stackoverflow post. This code works and I just want it to integrate to my code. I have tried to integrate myself but get errors instead. Can anyone help me do this? Here's the code:

from urllib2 import Request, urlopen, URLError
req = Request('http://jfvbhsjdfvbs.com')
try:
    response = urlopen(req)
except URLError, e:
    if hasattr(e, 'reason'):
        print 'We failed to reach a server.'
        print 'Reason: ', e.reason
    elif hasattr(e, 'code'):
        print 'The server couldn\'t fulfill the request.'
        print 'Error code: ', e.code
else:
    print 'URL is good!'
6
  • 1
    Looks nice, only that your while True is executed before you call main. Commented Dec 10, 2013 at 17:22
  • I'd rather check the response code, look at this post Commented Dec 10, 2013 at 17:25
  • yes that's what I need but i dont know how to implement it in my code. So im asking for help if anyone can help me do this Commented Dec 10, 2013 at 17:29
  • @Hyperboreus what do you mean? Commented Dec 10, 2013 at 17:35
  • @user3034404 A python script is execute top to bottom, in your case 1. your while with its suite, then two defs (adding the functions to the scope) and then the condition which maybe invokes main. By this order, your while is executed first and your main last in case the condition holds. Commented Dec 10, 2013 at 17:58

2 Answers 2

1

Maybe this snippet helps you to understand why your main is executed after the while:

print 'Checkpoint Alpha'

while True:
    print 'Checkpoint Bravo'
    if raw_input ('x for break: ') == 'x': break

print 'Checkpoint Charlie'

def main():
    print 'Checkpoint Foxtrott'

print 'Checkpoint Delta'

if __name__ == '__main__':
    print 'Checkpoint Echo'
    main()
    print 'Checkpoint Golf'

print 'Checkpoint Hotel'
Sign up to request clarification or add additional context in comments.

1 Comment

@KDawG You can take the officer out of the Air Force, but you can't take the Air Force out of the officer. Tally Ho!
0

Following should help you -

visited = []

in while loop - 
in try:
    url= raw_input('Please input address: ')
    if url in visited: 
        print "Already visited. Continue"
    visited.append(url)
    webpage=urllib.urlopen(url)
    [...]

1 Comment

I dont think this is what I need. I need a code that will check the connectivity to the given URL by the user

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.