1

I'm using Python to call an API that returns the last name of some soccer players. One of the players has a "ć" in his name.

When I call the endpoint, the name prints out with the unicode attached to it:

>>> last_name = (json.dumps(response["response"][2]["player"]["lastname"]))

>>> print(last_name)

"Mitrovi\u0107"

>>> print(type(last_name))

<class 'str'>

If I were to take copy and paste that output and put it in a variable on its own like so:

>>> print("Mitrovi\u0107")

Mitrović

>>> print(type("Mitrovi\u0107"))

<class 'str'>

Then it prints just fine?

What is wrong with the API endpoint call and the string that comes from it?

0

2 Answers 2

1

Well, you serialise the string with json.dumps() before printing it, that's why you get a different output. Compare the following:

>>> print("Mitrović")
Mitrović

and

>>> print(json.dumps("Mitrović"))
"Mitrovi\u0107"

The second command adds double quotes to the output and escapes non-ASCII chars, because that's how strings are encoded in JSON. So it's possible that response["response"][2]["player"]["lastname"] contains exactly what you want, but maybe you fooled yourself by wrapping it in json.dumps() before printing.

Note: don't confuse Python string literals and JSON serialisation of strings. They share some common features, but they aren't the same (eg. JSON strings can't be single-quoted), and they serve a different purpose (the first are for writing strings in source code, the second are for encoding data for sending it accross the network).

Another note: You can avoid most of the escaping with ensure_ascii=False in the json.dumps() call:

>>> print(json.dumps("Mitrović", ensure_ascii=False))
"Mitrović"
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for your help, I didn't realize what json.dumps() was doing.
1

Count the number of characters in your string & I'll bet you'll notice that the result of json is 13 characters:

"M-i-t-r-o-v-i-\-u-0-1-0-7", or "Mitrovi\\u0107"

When you copy "Mitrovi\u0107" you're coping 8 characters and the '\u0107' is a single unicode character.

That would suggest the endpoint is not sending properly json-escaped unicode, or somewhere in your doc you're reading it as ascii first. Carefully look at exactly what you're receiving.

1 Comment

Ah, that makes sense as to why I saw the "\\" when trying to figure this out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.