In Java I want to check some Python 2 files for syntax errors, so using Jython seemed like a good choice. In theory this should be easy, as indicated in another answer. As I'm reading from a file, I use a Reader. I would really prefer to use an InputStream.
Reader reader = openReaderToPythonFile();
new org.python.util.PythonInterpreter().compile(reader)
The only compile() options for PythonInterpreter take either String or Reader as parameters. This means that the content I feed it would already be in Unicode string form, not bytes.
The problem is that I want to check an existing Python file that has lines at the top indicating an encoding of UTF-8, following PEP 263. (This is because Python 2 source files are considered ASCII by default.) It looks something like this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
…
Even if I manually read the file (correctly) as UTF-8, when I pass the string (or Reader instance) to PythonInterpreter to compile, I get this error:
encoding declaration in Unicode string
In other words PythonInterpreter is saying, "This file has an encoding declaration, but I can't respect the encoding declaration because you've already converted the bytes to a string before I had a chance to analyze it". But PythonInterpreter doesn't seem to provide a way to pass it the raw bytes or (preferably) an InputStream.
How can I compile a Python file with Jython if the file contains an encoding declaration? If that's not possible, as a workaround is it possible for Jython to ignore the encoding declaration and trust that I've correctly converted the bytes to a String or Reader?