1

I wrote a python-daemon that parses some web pages. But sometimes there are errors due to the fact that some of the pages are not compatible with the parser.

Actually the question: how to make the script when errors did not stop, but just continued to work? And if possible, record all the errors in the log file.

Thanks.

Part of my code:

# row - array of links
for row in result:
    page_html = getPage(row['url'])
    self.page_data = row

    if page_html != False:
        self.deletePageFromIndex(row['id'])
        continue

    parser.mainlink = row['url']
    parser.feed(page_html)

    links = parser.links # get links from page
    words = wordParser(page_html); # words from page

    # insert data to DB
    self.insertWords(words)
    self.insertLinks(links)

    # print row['url'] + ' parsed. sleep... '

    self.markAsIndexed(row['id'])
    sleep(uniform(1, 3)) # sleep script
1
  • 4
    What have you try:ed? Commented Mar 12, 2013 at 5:36

1 Answer 1

1

Here's what you can do:

import logging
should_abort = False

def do_stuff():
    global should_abort
    ...

def main():
    while not should_abort:  # your main loop
        try:
            do_stuff()
        except MyException1, e:
            logging.exception('GOT MyException1 %s', e)
        except MyException2, e:
            logging.exception('GOT MyException2 %s', e)
        except Exception, e:
            logging.exception('UNKNOWN EXCEPTION %s', e)

This still allows you to stop using ctrl-C, as KeyboardInterrupt derives from BaseException, not Exception.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.