3

I have a project in Python 2.6 and I'd like to write a utf-8 message to stdout using the system encoding. However it appears that such a function does not exist until Python 3.2:

PySys_FormatStdout

http://docs.python.org/dev/c-api/sys.html

Is there a way to do this from Python 2.6?

To clarify I have a banner that needs to print after Py_Initialize() and before the main interpreter is run. The string is a c-literal containing: "\n and Copyright \xC2\xA9"

where \xC2\xA9 is the utf-8 copyright symbol. I verified in gdb that the copyright symbol is encoded correctly.

Update: I just decided all this grief isn't necessary and I'm going to remove the offending character from the startup banner. There are just too many issues with this, and the documentation is lacking. My expectations were that this would be like Tcl, where:

  1. The embedded interpreter's C-API would make writing stdout out in unicode easy in the system's encoding, and not some default ascii encoding
  2. An exception wouldn't be thrown, if an offending character does not exist in the current encoding. Instead some default replacement character would be displayed.
  3. Additional modules, (e.g. sys), would not be necessary to import just to find out what the system encoding is.
2
  • 1
    1. bugs.python.org/issue4947 (encode by hand in Python < 2.7) 2. use errors="replace" instead of errors="strict" if you must 3. PyUnicode_GetDefaultEncoding() Commented Dec 23, 2010 at 7:06
  • Thanks J.F., As of now I am just going to avoid using the character in my application's banner. Commented Dec 23, 2010 at 17:34

2 Answers 2

2

PyUnicode_DecodeUTF8()

PyObject_Print()

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, I just need to know how would I get the FILE * associated with any redirections of stdout by the person executing the python interpreter.
You either want stdout itself, or PySys_GetFile("stdout", stdout), depending on what you mean by that.
I'm not too familiar working with file handles directly, but I just need to make sure that things which get written out go wherever stdout has been redirected to.
Unfortunately, the string has all of my carriage returns escaped: u'\n-------------------- and looks like some type of literal that would go into a python script. In addition, the symbol of interest, ©, is written as \xa9, which printed to the screen in my utf-8 environment should be \xc2\xa9
sys.stdout could refer to an arbitrary Python object (PyObject*) with the .write() method, but PyObject_Print() requires FILE*.
1

You could use PyFile_WriteObject():

f_stdout = PySys_GetObject("stdout");
text = PyUnicode_DecodeUTF8((char*)str, strlen(str), "strict");
PyFile_WriteObject(text, f_stdout, Py_PRINT_RAW);

If you know the final encoding then you could use PyUnicode_AsEncodedString().

5 Comments

Thanks for your suggestion. The problem I'm getting now is that it is using ASCII instead of the UTF-8 encoding of the system: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 80: ordinal not in range(128)
@Juan: what does sys.getdefaultencoding() return?
'ascii', but it needs to use the sys.stdout.encoding, 'utf-8'
Thanks J.F. But I still need to figure out where to get the system stdout encoding from the C-API without importing the sys module and calling the interpreter to do this. I guess maybe it is safe to assume that the sys module is available for import.
Giving this to J.F. as he correctly identified this as a bug.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.