1

The script I am writing should exit back to the shell prompt with a helpful message if the data to be processed is not exactly right. The user should fix the problems flagged until the script is happy and no longer exits with error messages. I am developing the script with TTD, so I write a pytest test before I write the function.

The most heavily up-voted answer here suggests that scripts be edited by calling sys.exit or raising SystemExit.

The function:

def istext(file_to_test):
    try:
        open(file_to_test).read(512)
    except UnicodeDecodeError:
        sys.exit('File {} must be encoded in UTF-8 (Unicode); try converting.'.format(file_to_test))

passes this test (where _non-text.png is a PNG file, i.e., not encoded in UTF-8):

def test_istext():
    with pytest.raises(SystemExit):
        istext('_non-text.png')

However, the script continues to run, and statements placed after the try/except block execute.

I want the script to completely exit every time so that the user can debug the data until it is correct, and the script will do what it is supposed to do (which is to process a directory full of UTF-8 text files, not PNG, JPG, PPTX... files).

Also tried:

The following also passes the test above by raising an exception that is a sub-class of SystemExit, but it also does not exit the script:

def istext(file_to_test):
    class NotUTF8Error(SystemExit): pass
    try:
        open(file_to_test).read(512)
    except UnicodeDecodeError:
        raise NotUTF8Error('File {} must be UTF-8.'.format(file_to_test))

4 Answers 4

1

You can use raise Exception from exception syntax:

class MyException(SystemExit):
    pass


def istext(file_to_test):
    try:
        open(file_to_test).read(512)
    except UnicodeDecodeError as exception:
        raise MyException(f'File {file_to_test} must be encoded in UTF-8 (Unicode); try converting.') \
            from exception 

I this case you doesn't change original error message and add your own message.

Sign up to request clarification or add additional context in comments.

10 Comments

Hmm, I could not get this example to work, even without the error message.
After declaring the class class NotUTF8Error(SystemExit): pass, and after except UnicodeDecodeError as exception:, raise NotUTF8Error from exception gets a SyntaxError.
Never mind - I got this to work, only now the pytest is failing with NameError: name 'NotUTF8Error' is not defined.
Okay - even the pytest works now, but with with pytest.raises(SystemExit) and not, as I had expected with with pytest.raises(NotUTF8Error) - even though NotUTF8Error was declared with class NotUTF8Error(SystemExit): pass. The test fails with NameError: name 'NotUTF8Error' is not defined. Though I do not quite understand why, it all works now!
I would like to upvote this answer, but I was not able to get it to work with the 'f'File {file_to_test}...' syntax - only with 'File {}...'.format(file), and I could not find any documentation for strings prefixed with f.
|
1

The try...except block is for catching an error and handling it internally. What you want to do is to re-raise the error.

def istext(file_to_test):
try:
    open(file_to_test).read(512)
except UnicodeDecodeError:
    print(('File {} must be encoded in UTF-8 (Unicode); try converting.'.format(file_to_test)))
    raise

This will print your message, then automatically re-raise the error you've caught.

Instead of just re-raising the old error, you might want to change the error type as well. For this case, you specify raise further, e.g.:

raise NameError('I'm the shown error message')

8 Comments

My first instinct had been to create a custom exception along the lines of class NotUTF8Error(SystemExit):. So you are suggesting that after except UnicodeDecodeError: I would raise NotUTF8Error("Some message")?
Exactly. Or you specify in the definition of your error class already what the text is going to say, so that no re-raising is necessary.
I tried the second suggestion (see amended question above), but like my original attempt, it passes the pytest but does not exit the script. The first suggestion (re-raising UnicodeDecodeError after printing the message) passes a modified pytest but without exiting the script.
In that case, it is most likely that either Python automatically detects the encoding as Unicode, or that the symbol you copied into it isn't actually unicode. To force Python to use a specific encoding in the open command, use: open("Filename",encoding="UTF-8") ; Now, to make sure your file has unicode in it, copy this symbol into it: ⎌
Thank you, @Sudix, that's an interesting suggestion. In this case, however, my intention is to ensure that every visible file in the working directory is a text file. But since "text file" is hard to test for in a UTF-8 environment, I want to use the fact that open(file).read(512) will raise an exception for PNG, JPG, PPTX, DOCX... files as a simple test.
|
0

You problem is not how to exit a program (sys.exit() works fine). You problem is that your test scenario is not raising a UnicodeDecodeError.

Here's a simplified version of your example. It works as expected:

import pytest
import sys

def foo(n):
    try:
        1/n
    except ZeroDivisionError as e:
        sys.exit('blah')

def test_foo():
    # Assertion passes.
    with pytest.raises(SystemExit):
        foo(0)
    # Assertion fails: "DID NOT RAISE <type 'exceptions.SystemExit'>"
    with pytest.raises(SystemExit):
        foo(9)

Add some diagnostic printing to your code to learn more. For example:

def istext(file_to_test):
    try:
        content = open(file_to_test).read(512)
        # If you see this, no error occurred. Maybe your source
        # file needs different content to trigger UnicodeDecodeError.
        print('CONTENT len()', len(content))
    except UnicodeDecodeError:
        sys.exit('blah')
    except Exception as e:
        # Maybe some other type of error should also be handled?
        ...

1 Comment

This helpfully clarifies that pytest is not testing whether 1/0 raises ZeroDivisionError, but whether sys.exit raises SystemExit. Thanks!
0

In the end, what worked is similar to what @ADR proposed, with one difference: I was not able to get the formatted string syntax shown above to work correctly (f'File {file_to_test} must...'), nor could I find documentation of the f prefix for strings.

My slightly less elegant solution, then, for the (renamed) function:

def is_utf8(file):
    class NotUTF8Error(SystemExit): pass
    try:
        open(file).read(512)
    except UnicodeDecodeError as e:
        raise NotUTF8Error('File {} not UTF-8: convert or delete, then retry.'.format(file)) from e

passes the pytest:

def test_is_utf81():
    with pytest.raises(SystemExit):
        is_utf8('/Users/tbaker/github/tombaker/mklists/mklists/_non-text.png')

2 Comments

Are you seriously? You don't like my solution anymore because I use one simple feature of Python 3.6? docs.python.org/3/whatsnew/3.6.html python.org/dev/peps/pep-0498
@ADR thank you for the reference - My apologies - I looked for that feature but didn't find it; it didn't work for me because I have Python 3.5.2. I'm removing the acceptance and accepting your answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.