1

On a website I have the word pluș sent via POST to a Django view. It is sent as plu%25C8%2599. So I took that string and tried to figure out a way how to make %25C8%2599 back into ș.

I tried decoding the string like this:

from urllib import unquote_plus
s = "plu%25C8%2599"
print unquote_plus(unquote_plus(s).decode('utf-8'))

The result i get is pluÈ which actually has a length of 5, not 4.

How can I get the original string pluș after it's encoded ?

edit:

I managed to do it like this

def js_unquote(quoted):
  quoted = quoted.encode('utf-8')
  quoted = unquote_plus(unquote_plus(quoted)).decode('utf-8')
  return quoted

It looks weird but works the way I needed it.

2
  • Which unicode character is that supposed to be? U+0219 (Latin small s with comma below) or U+015F (latin small S with cedilla)? Commented Dec 2, 2010 at 14:57
  • @bgporter: S-comma, U+0219. It's irrelevant which letter though, as I'm having this problem for other unicode letters as ț, â, etc. Commented Dec 2, 2010 at 15:19

3 Answers 3

2

URL-decode twice, then decode as UTF-8.

Sign up to request clarification or add additional context in comments.

6 Comments

If I URL-decode twice then decode as UTF-8 I get 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128) :(
@yoshi: That's your console being unable to output it. The result is fine.
print unquote_plus(unquote_plus('plu%25C8%2599')).decode('utf-8') gives me pluș. But when I put it inside the application it spews that ascii error :| It's on the same machine, same environment.
Is the decoded result being saved to a database? If so, I'm guessing it might be the database not configured with utf-8 as the encoding.
The result is saved in a MySQL database but it has utf8 encoding. The problem occurs before querying the database.
|
1

You can't unless you know what the encoding is. Unicode itself is not an encoding. You might try BeautifulSoup or UnicodeDammit, which might help you get the result you were hoping for.

http://www.crummy.com/software/BeautifulSoup/

I hope this helps!

Also take a look at:

http://www.joelonsoftware.com/articles/Unicode.html

2 Comments

I should note that Ignacio's answer is the solution for you specific problem, but keep in mind that you won't always know the encoding of what your user is posting.
Also, please excuse my suggestion if you were already familiar those links and how Unicode and encoding works :)
0
unquote_plus(s).encode('your_lang_encoding')

I was try like that. I was tried to sent a json POST request by HTML form to directly a django URI, which is included unicode characters like "şğüöçı+" and it works. I have used iso_8859-9 encoder in encode() function.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.