1

I want to retrieve some data from a dbase. All the tables in it have the utf8_general_ci collation.

By the way, this is a .cgi file, so it is executed by means of an Ajax call.

I'm doing this to make the connection:

#!/home/mike/python_venvs/test_venv369/bin/python
...

conn = mysql.connector.connect( host='', database='test_kernel',
                                user='root', password='root',
                                charset='utf8', use_unicode=True )
...
query = ("SELECT * from invoices limit 2")
cursor.execute( query )

for x in cursor:
    print( type( x  )) # is a tuple, i.e. the row
    for y in x:
        print( type( y ) ) # the problem field prints "str"
        if type( y ) == 'str':
            y = y.encode( 'utf-8')
        print( y )

On the encoding line above I get:

<class 'UnicodeEncodeError'> 'ascii' codec can't encode character '\xa3' in position 0: ordinal not in range(128)

With all the permutations I've tried I get the same thing. '\xa3', by the way, is the '£' character, non-ASCII.

I've tried many different approaches, found mainly here in SO: encode, decode, ... Nothing seems to work. I thought the str type was Python 2... but this is definitely a Python3 program, something which I actually checked with sys.version_info[ 0 ]!

8
  • What's the the output of python -c "import locale;print(locale.getpreferredencoding())"? Commented May 16, 2020 at 15:02
  • Thanks. I assume this is to be run in the virtual environment stipulated in the shebang at the start of this file, right? It gives "UTF-8". Adding shebang line in the question. Commented May 16, 2020 at 15:08
  • If this is .cgi, then it's being executed by your webserver? So maybe su to the webserver's user (or perhaps root, if root starts the webserver process) and execute the above command as that user. Commented May 16, 2020 at 15:13
  • Thanks again. Same output for SU. I assume the Apache server process's owner is the SU... but how do I check that? To start and stop I do sudo systemctl start apache2. Commented May 16, 2020 at 15:19
  • Maybe just print locale.getpreferredencoding() before you iterate over the query result Commented May 16, 2020 at 15:26

1 Answer 1

1

Thanks to the help of snakecharmerb's comments, which then led me to this answer, I found a solution which works:

import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)

I think this constitutes a workaround, and it'd be great if anyone could explain how this setting for locale.getpreferredencoding() gets to be set at ASCII/ANSI_X3.4-1968 ... even better if they could then say how to set it to something else.

The culprit is probably Apache, though I'm far from sure.

The question referenced by snakecharmerb unfortunately did not provide a solution for me: I added (or rather uncommented) the following line in /etc/apache2/conf-enabled/charset.conf

AddDefaultCharset UTF-8

... and restarted Apache. No change.

Edit
Output from various settings for su which might be involved:

M17A ~ # locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
M17A ~ # echo $LANG
en_GB.UTF-8
M17A ~ # locale charmap
UTF-8

I believe it is su/root which is indeed running the Apache process.

Edit 2
I thought I'd look into the ownership of the processes on my machine. So I ran ps aux. Some possibly relevant processes came up which are not owned by me or by root:

USER # i.e. owner
...
mysql     1413  0.0  0.1 1419400 16760 ?       Ssl  May15   0:50 /usr/sbin/mysqld
...
www-data  5825  0.0  0.0 143296  5536 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5826  0.0  0.1 298492 21900 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5827  0.0  0.1 298096 18700 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5828  0.0  0.0 296044 15872 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5829  0.0  0.1 296040 16876 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
www-data  5830  0.0  0.0 296052  7972 ?        S    07:35   0:00 /usr/sbin/apache2 -k start
...
www-data  9636  0.0  0.0 296052  7856 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9639  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9640  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start
www-data  9641  0.0  0.0 295572  6324 ?        S    08:16   0:00 /usr/sbin/apache2 -k start

Maybe one of these owners is using this ASCII encoding? I wonder how I might find out...

Sign up to request clarification or add additional context in comments.

2 Comments

The root cause is likely that the user that is running the apache process has an ASCII locale (try su to that user and execute the locale command, also echo $LANG and locale charmap ). So the solution probably to change their default locale (but I'm not strong on linux admin, so I don;t want to suggest this as a hard and fast solution)
I'm not strong on it either. I'll have a go, but all indications are that it is root (i.e. su) which runs the apache process. I'm adding the output from those commands to my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.