8

My task is simple: make a post request to translate.google.com and get the translation. In the following example I'm using the word "hello" to translate into russian.

header('Content-Type: text/plain; charset=utf-8');  // optional
error_reporting(E_ALL | E_STRICT);

$context = stream_context_create(array(
    'http' => array(
        'method' => 'POST',
        'header' => implode("\r\n", array(
            'Content-type: application/x-www-form-urlencoded',
            'Accept-Language: en-us,en;q=0.5', // optional
            'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' // optional
        )),
        'content' => http_build_query(array(
            'prev'  =>  '_t',
            'hl'    =>  'en',
            'ie'    =>  'UTF-8',
            'text'  =>  'hello',
            'sl'    =>  'en',
            'tl'    =>  'ru'
        ))
    )
));

$page = file_get_contents('http://translate.google.com/translate_t', false, $context);

require '../simplehtmldom/simple_html_dom.php';
$dom = str_get_html($page);
$translation = $dom->find('#result_box', 0)->plaintext;
echo $translation;

Lines marked as optional are those without which the output is the same. But I'm getting weird characters...

������

I tried

echo mb_convert_encoding($translation, 'UTF-8');

But I get

ÐÒÉ×ÅÔ

Does anybody know how to solve this problem?

UPDATE:

  1. Forgot to mention that all my php files are encoded in UTF-8 without BOM
  2. When i change the "to" language to "en", that is translate from english to english, it works ok.
  3. I do not think the library I'm using is messing it up, because I tried to output the whole $page without passing it to the library functions.
  4. I'm using PHP 5
4
  • Is your string still garbled if you echo $page directly? Commented Apr 3, 2009 at 10:19
  • no only the translation is garbled Commented Apr 3, 2009 at 10:25
  • It seems that the external library you're using (simple_html_dom) is messing it up. Either it's badly written or there's an option for this in their API somewhere. You might wanna add this info to your question. Commented Apr 3, 2009 at 10:31
  • I do not think the library I'm using is messing it up, because I tried to output the whole $page without passing it to the library functions. Commented Apr 3, 2009 at 10:37

3 Answers 3

10

Try to see this post if it can help CURL import character encoding problem

Also you can try this snippet (taken from php.net)

<?php
function file_get_contents_utf8($fn) {
     $content = file_get_contents($fn);
      return mb_convert_encoding($content, 'UTF-8',
          mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}
?>
Sign up to request clarification or add additional context in comments.

2 Comments

Yes I already tried that, and it's output is the same as the second output in my question
this worked for me, thanks. I knew my file was in ISO-8859-1 because I put the file name in Chrome and looked at the headers, the encoding is there. Also you can see the encoding by printing $http_response_header right after your file_get_contents call
9

First off, is your browser set to UTF-8? In Firefox you can set your text encoding in View->Character Encoding. Make sure you have "Unicode (UTF-8)" selected. I would also set View->Character Encoding->Auto-Detect to "Universal."

Secondly, you could try passing the FILE_TEXT flag, like so:

$page = file_get_contents('http://translate.google.com/translate_t', FILE_TEXT, $context);

Comments

1

Accept-Charset is not really that optional. You should specify UTF8 there. Russian characters are not valid in ISO_8859-1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.