0

I have a variable which contains a string in Persian language, and I cannot save that string into the database correctly. I am using flask for REST API, and I am getting the string from client. Here's my code:

@app.route('/getfile',methods=['POST'])
def get_file():
    #check the validity of json format
    if not request.json or not 'FileName' in request.json:
        abort(400)
    if not request.json or not 'FilePath' in request.json:
        abort(400)
    if not request.json or not 'Message' in request.json:
        abort(400)
    #retreive data from request
    filename_=request.json['FileName']
    filepath_=request.json['FilePath']
    message_=request.json['Message']

    try:
        conn = mysql.connector.connect(host=DBhost,database=DBname,user=DBusername,password=DBpassword)
    except:
        return jsonify({'Result':'Error, Could not connect to database.'})

    cursor_ = conn.cursor()
    query_ = "INSERT INTO sms_excel_files VALUES(null,%s,%s,%s,0)"
    data_ =(filename_,Dst_num_file,message_)
    cursor_.execute(query_, data_)
    last_row_id_=cursor_.lastrowid
    conn.commit()

The variable in question is message_. I can save English texts correctly, but not Persian ones. I also added # -*- coding: utf-8 -*- at the top of my code, but this did not solve the problem. But if I manually fill message_ with a Persian string, it is saved correctly to the database. Furthermore, if I simply return the value of message_, it is correct.

For example, this is what gets inserted into the database when message_ contains the string 'سلام':

سلام

Any help is appreciated.

2 Answers 2

1

Please note that this is the first time I am trying to read Arabic / Persian characters, so the following information might not be correct (I could have made a mistake when comparing my test output with the Persian string you have shown in your question). Furthermore, I never have heard of flask so far.

Having said this:

1587 1604 1575 1605 is the sequence of code points which represents the Persian string you have shown in Unicode. Now, in HTML, Unicode code points (in decimal) can be encoded as entities in the form &#xxxx;. So the string سلام is one of the allowed forms of representation of that string in HTML.

Given that, there could be two possible reasons for the misbehavior:

1) request.json['Message'] already contains / returns HTML (and not natural text) and (for some reason I don't know) contains / returns the string in question in HTML-entity encoded form. So this is the first thing you should check.

2) cursor_.execute(...) somehow encodes the string into HTML and thereby (for some reason I don't know) encodes your string in question into HTML-entity encoded form. Perhaps you have told your database driver to encode non-ASCII characters in message_ as HTML entities?

For further analysis, you could check what happens in a test case where request.json['Message'] contains / returns only ASCII characters.

If ASCII characters are written into the database as HTML entities as well, there must be a basic problem which causes all characters without exception to be encoded into HTML entities.

Otherwise, you eventually have not told your database, your database drivers or your file system drivers which encoding to use. In such cases, ASCII characters are often treated correctly, whereas weird things happen to non-ASCII characters. Automatically encoding non-ASCII characters to HTML entities during file IO or database operations would be very unusual, though. But as mentioned above, I don't know flask ...

Please consult the MySQL manual to see how to set the character encoding for databases, tables, columns and connections, your database driver documentation to see which other things you must do to get this encoding to be handled correctly, and your interpreter's and its libraries' manuals to see how to correctly set that encoding for file IO (CGI works via STDIN / STDOUT).

You make your life a lot easier if the database character encodings and the file IO encoding are all the same. Personally, I always use UTF-8.

A final note: Since I don't know anything about flask, I don't know what # -*- coding: utf-8 -*- is supposed to do. But chances are that this only tells the interpreter how the script itself is encoded, but not which encoding to use for input / output / database operations.

Sign up to request clarification or add additional context in comments.

8 Comments

I don't think it is so that hard as you explained. I can return the message_ correctly just after message_=request.json['Message']. The string gets corrupted when it is inserted into the database. The database collation is utf8_general_ci and the column is also utf8_general_ci.
@Sinai How exactly did you check that you can return message_ correctly? Did you dump it into a file, did you print it out to the browser, or something else?
I printed it out to the browser. I simply put a return message_ just after the message_=request.json['Message'] in my code and I could see the exact string!
@Sinai As I have explained in my answer, the browser will show the string 'سلام' as 'سلام'. So dumping the variable's content to the browser is not meaningful (unless you look into the page source, which I suppose you didn't). Please read my answer thoroughly again and try to understand every section. If something is not clear, post a comment, and I'll do my best to clarify.
Thank you for your explanation. I think you are right. I tried to save the value of message_ in a file and I got this سلام already!! I tried to convert HTML-entity encoded to plain text with several ways but I always get the error UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
|
0

Try this code. it is using MySQLdb library which is almost like the library you are using (install it using pip before using).

I tried to set "utf-8" in all possible ways.

    # -*- coding: utf-8 -*-
    import MySQLdb

    # Open database connection
    try:
        db = MySQLdb.connect(host="localhost",
                            user="root",
                            passwd="",
                            db="db_name"
                            #,unix_socket="/opt/lampp/var/mysql/mysql.sock"
                            )
        db.set_character_set('utf8')

        crsr = db.cursor(MySQLdb.cursors.DictCursor)
        crsr.execute('SET NAMES utf8;')
        crsr.execute('SET CHARACTER SET utf8;')
        crsr.execute('SET character_set_connection=utf8;')

    except MySQLdb.Error as e:
        print e

1 Comment

I could not install MySqldb.I tried several ways but none of them worked. I am using Python 2.7

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.