0

I have a large file containing thousands of links. I've written a script calling each link line-by-line and performing various analyses on the respective webpage. However, sometimes it is the case that the link is faulty (article removed from website, etc), and my whole script just stops at that point.

Is there a way to circumvent this problem? Here's my (pseudo)code:

for row in file:
    url = row[4]
    req=urllib2.Request(url)
    tree = lxml.html.fromstring(urllib2.urlopen(req).read())
    perform analyses
    append analyses results to lists
output data

I have tried

except:
    pass

But it royally messes up the script for some reason.

2 Answers 2

2

Works for me:

for row in file:
    url = row[4]
    try:
        req=urllib2.Request(url)
        tree = lxml.html.fromstring(urllib2.urlopen(req).read())
        perform analyses
        append analyses results to lists
    except URLError, e:
        pass
output data
Sign up to request clarification or add additional context in comments.

2 Comments

Do you mean to put the try/except block inside the for loop? Otherwise it will stop at that line, exactly the problem the OP wants to avoid
Oh god its getting late. I'm not thinking 100% Thank you for the corection
0

Try block is the way to go:

for row in file:
url = row[4]
    try:
        req=urllib2.Request(url)
        tree = lxml.html.fromstring(urllib2.urlopen(req).read())
    except URLError, e:
        continue
    perform analyses
    append analyses results to lists
output data

Continue will allow you to skip any unnecessary computation after the url check and restart at the next iteration of the loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.