1

I am scrapping data out of a file, from that data i'm getting the year out. When i try to convert that year (2011) to an int, i get a weird result (2). Here's what my code looks like. $year is the value i am getting from the file.

$year_int = (int) $year;

var_dump($year); //Return string(8) "2011"
var_dump($year_int); //Return int(2)

I expect $year_int to be an int(2011). And why is $year a string(8) shouldn't it be a string(4)?

6
  • Also unable to re-produce, the code as presented in OP is legit. Commented Oct 29, 2011 at 19:08
  • Actually you know what is odd, is the var_dump($year) saying string(8), it should be string(4). Commented Oct 29, 2011 at 19:09
  • @DigitalPrecision - You're missing out a fact the OP did not mention. You really do not know what the original data (2011) was except what the OP got from the PHP result (which is in fact wrong anyway). See the op's code in action here: codepad.viper-7.com/A3nVjX Commented Oct 29, 2011 at 19:18
  • @ChristianSciberras: Actually, according to his comments after the var_dump of the year, he said it just spits out 2011. Commented Oct 29, 2011 at 19:26
  • @DigitalPrecision Didn't you see my link? It did say '2011' as well... Commented Oct 29, 2011 at 19:37

2 Answers 2

3

I reckon your string is UTF16-encoded, so each char is encoded with 16 bits, or 2 bytes. PHP still considers it a ASCII string, reads the 1st byte (2), then the 2nd byte (zero char), and stops there.

iconv('UTF-16', 'ASCII', $year) should help

EDIT I guessed that the string is in UTF16, because its characters, while being ASCII, took up 2 bytes each. Your string could be in one of the Asian two-byte encodings, but still most likely it's Unicode, and you're likely on Windows, because UTF16 is Windows' internal encoding.

Here's a good starter article on Unicode: http://www.joelonsoftware.com/articles/Unicode.html

Sign up to request clarification or add additional context in comments.

4 Comments

I was going to ask how you guessed that encoding, then a realized that if the encoding was UTF8, it would still work since UTF8 is compatible with ASCII. Perhaps you might want to write a formal explanation on how you came to your guess?
This is returning an empty string. And had to change UTF16 to UTF-16 for it to work.
@LeonidShevtsov I'm sure it will help future readers, thanks.
iconv wasn't working. mb_convert_encoding worked. And it turned out the original encoding was UTF-16LE
1

string(8) "2011" - does nothing seem odd to you about that? Maybe the fact that there are only four characters visible?

Try this:

for( $i=0; $i<strlen($year); $i++) echo ord($year[$i])." ";

See what that gives you. If it were correct, it should print "50 48 49 49".

Chris edit: Thought I'd expand on this answer. Please see the example here on what Kolink meant by "invisible" characters.

6 Comments

it returned 50 0 48 0 49 0 49 0 but what does it mean?
an str_replace("\0",'',$year) worked. is there a better way to do it?
@chaft The formal way to do this is as Leonid advised.
@ChristianSciberras Leonid's method is returning an empty string.
@chaft First of, the str_replace is a short-term fix. It will haunt you until you find why it is happening. Secondly, please tell us where '2011' originally came from. Did you type it in your code editor, or a form in a web browser? Did you store it in a DB? After that, we will be able to advise a fix. If you typed it in your code editor, be sure your code is in a file with the right encoding (namely, UTF-8 or plain ASCII).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.