0

I am trying to process data that I got using curl, but I have issues with encoding - I am unable to find right way to handle it.

This is the text I got (in HEX) - '6B 64 6F 20 6D C3 A1' that should evaluate to string 'kdo má' but instead of it, it evaluates to 'kdo m??' (actually, the last two chars aren't question marks but http://www.fileformat.info/info/unicode/char/c3/index.htm and http://www.fileformat.info/info/unicode/char/a1/index.htm)

I don't understand why some chars are 8bit and diacritic chars are 16 bit and how should PHP know which one is which, but anyway, how should I decode it?

2
  • “I don't understand why some chars are 8bit and diacritic chars are 16 bit” – because that’s how a variable-width encoding works … Commented Sep 16, 2013 at 21:53
  • You're probably getting UTF-8 text, which uses "high" ascii for the extended code sequences (lower 7bits of UTF-8 correspond 1:1 with US-ASCII). But you're probably dumping that UTF text into a different charset's environment, where the UTF-8 hibit escapes have no meaning, e.g. iso-8859. Commented Sep 16, 2013 at 21:54

1 Answer 1

0

don't understand why some chars are 8bit and diacritic chars are 16 bit

Most likely because it's UTF8 or perhaps even UTF16. And by default PHP assumes one character == one byte

and how should PHP know which one is which, but anyway, how should I decode it?

No. You have to tell it. Check mbstring: http://php.net/manual/de/book.mbstring.php or recode: http://php.net/manual/en/book.recode.php

Sign up to request clarification or add additional context in comments.

1 Comment

I tried everything I found. utf8_encode/decode, mbstring functions, iconv ... but nothing helped.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.