1

I have implemented a PHP script.
I run my PHP script via the following URL : http://server/script.php?param1=%80t%80

So I pass a GET parameter to my PHP script.
The parameter is named param1.
param1 contains the string "€t€" which is URL-encoded as "%80t%80".

My PHP script is encoded with the UTF-8 norm.
I was wondering which character encoding applies on the string contained in $_GET["param1"].

For sure the character encoding on $_GET["param1"] is not UTF-8.
The reason : The following command in my PHP script results to "80 74 80" which is the hexadecimal representation of $_GET["param1"].

var_dump(unpack("H*", $_GET["param1"]));

If the character encoding on $_GET["param1"] was UTF-8 then the previous PHP command would result to "e2 82 ac 74 e2 82 ac".

The character encoding on $_GET["param1"] is not ISO-8859-1 neither because the € symbol is not included in the IS0-8859-1 charset.
To view the ISO-8859-1 encoding table go to http://en.wikipedia.org/wiki/ISO/IEC_8859-1
So the PHP internal encoding returned by the mb_internal_encoding function does not apply on $_GET["param1"] because it is IS0-8859-1.

Does anyone know which character encoding applies on the string contained in $_GET["param1"] ?

0

3 Answers 3

0

I am not sure I understand why you are using unpack while trying to deal with a character-encoding problem you are trying to solve. So here it goes...

I suppose you are trying to read the value of $_GET['param1'] with something like:

$var = $_GET['param1']; I suggest you try urldecode $var = urldecode($_GET['param1']) and then use functions for handling multiByte strings http://gr.php.net/manual/en/ref.mbstring.php or use the iconv functions.

Hope the above helps.

Sign up to request clarification or add additional context in comments.

1 Comment

I have used the unpack function just for a testing purpose. I needed to see the bytes representing the string contained in param1. My final goal is UTF-8 encoding each string received from a GET parameter. I have planned to use the mb_convert_encoding php function but I need to know which encoding is initially used to represent the strings in GET array.
0

For sure the character encoding on $_GET["param1"] is not UTF-8. The reason : The following command in my PHP script results to "80 74 80" which is the hexadecimal representation of $_GET["param1"].

This is exactly what you'd expect, because it's what you've written. The parameter %80t%80 means three characters: hex 80, "t", hex 80. %80 means "hex 80". You're manually specify a specific hex value, character encoding doesn't come in to this at all.

Try this:

var_dump( unpack ("H*", urldecode("%80t%80")));

And this:

http://server/script.php?param1=%e2%82%ac%74%e2%82%ac

Comments

0

According to https://www.w3schools.com/tags/ref_urlencode.ASP:

"URLs can only be sent over the Internet using the ASCII character-set."

Actually a subset of ASCII seems to be what URL-encoding targets. https://www.php.net/manual/en/function.urlencode.php says urlencode():

"is convenient when encoding a string to be used in a query part of a URL, .. to pass variables to the next page... [It] Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs."

Those are evidently mappings into alphanumeric ASCII (plus at least ._-+%) from other characters within ASCII or from other character representations within UTF-8 or other charsets. To map back to, say, UTF-8, might require knowing what was started with. Yet in the reverse direction, it is unclear: https://www.php.net/manual/en/function.urldecode.php says that urldecode():

Decodes any %## encoding in the given string. Plus symbols ('+') are decoded to a space character.

Not only it is unclear if the decoding output is UTF-8 or what, but (the reason I found myself looking at this question) when I wanted a '+' to be sent within a parameter via the URL (which arrives into PHP as $_GET["paramName"]), it arrived as a space (' ') as warned above, until I used urlencode($paramVal) to convert '+' to %2B before inserting that into the URL. Then it shows up in the browser's URL widget with ?paramName=...%2B... but arrives in PHP urldecoded as a '+'.

I would hope that the output of PHP urldecode() is UTF-8 since I had among my HTML headers <meta charset="UTF-8">, but that's a guess.

I don't find my answer too helpful to you, but if I could have read my answer to your question, which came up while searching to solve my problem, it would have saved me an hour. Maybe it'll help the next person.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.