0

I have an Android application where I read sms messages and send it to the google app engine server. Some of the users are complaining that the certain languages are not coming through properly.

        // Execute query
        cursor = context.getContentResolver().query(
                SMS_PROVIDER_URI,
                SMS_QUERY_FIELDS,
                "date >= " + startDate.getTime(),  // selection - get messages > startDate
                null,                              // selectionArgs
                "date ASC");                       // order - get oldest messages first

        // Iterate results
        if (cursor != null && cursor.moveToFirst()) {

            // read through all the sms and create a list
            do {
                String sender              = cursor.getString(0);
                String message             = cursor.getString(2);
                boolean isIncomingMessage  = cursor.getString(3).contains("1");
                Date date                  = new Date(cursor.getLong(1));

                String contactName = ContactLookup.lookup(context, sender);

                smsList.add(new SMSMessageInfo(sender, contactName,
                        message, isIncomingMessage, date));

            } while (cursor.moveToNext());
        }

message variable contains sms messages from different languages. How do I support it? Also, I need to send it to my server (python) and how do I translate the unicode on the sever?

1

1 Answer 1

1

In Python 2.7 there are two classes of string, str (the standard strings, consisting of bytes) and unicode (consisting of unicode characters, denoted as literal using the u prefix: u"foo"). Conversions are done by using methods on the instances:

u"blä".encode('utf8') → "bl\xc3\xa4"  # from unicode to str
"bl\xc3\xa4".decode('utf8') → u"blä"  # from str to unicode

Conversion often takes place implicitly, e. g. if you add a str to a unicode, the str gets promoted to a unicode (on default by using encoding ascii) prior to concatenation.

On the other hand, a unicode instance that gets printed will be converted to a str first, using an encoding that depends on the stream it gets printed on (typically ascii as well).

These occasions of automatic conversion are often the source of exceptions (namely if conversion fails). If you catch too many exceptions, these might go unnoticed, and then just some facility does not work.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.