8

Here is php code:

$arr=array(228,184,173,230,150,135,99,104,105,110,101,115,101);
$str='';
foreach ($arr as $i){
    $str.=chr($i);
}
print $str;

the output is: 中文chinese

Here is javascript code:

var arr=[228,184,173,230,150,135,99,104,105,110,101,115,101];
var str='';
for (i in arr){
    str+=String.fromCharCode(arr[i]);
}
console.log(str);

the output is: 中æchinese

So how should I process the array at javascript?

6
  • 1
    When I run the PHP code, I get the output 中文chinese. Is there anything special about your PHP configuration? Commented Dec 25, 2012 at 6:10
  • I get the same exact output as @Stegrex Commented Dec 25, 2012 at 6:12
  • @Stegrex Maybe it is the problem of locale setting. you could try to cancel the comment zh_CN.XXX at /etc/locale.gen Commented Dec 25, 2012 at 6:23
  • I am not sure how it works out in your PHP code. But for javascript the correct array is [20013,25991,99,104,105,110,101,115,101] Commented Dec 25, 2012 at 6:25
  • @Stegrex: you are viewing it in ASCII. Interpret it as UTF-8. Commented Dec 25, 2012 at 6:27

5 Answers 5

19

JavaScript strings consist of UTF-16 code units, yet the numbers in your array are the bytes of a UTF-8 string. Here is one way to convert the string, which uses the decodeURIComponent() function:

var i, str = '';

for (i = 0; i < arr.length; i++) {
    str += '%' + ('0' + arr[i].toString(16)).slice(-2);
}
str = decodeURIComponent(str);

Performing the UTF-8 to UTF-16 conversion in the conventional way is likely to be more efficient but would require more code.

Sign up to request clarification or add additional context in comments.

2 Comments

@PleaseStand, What do you mean by "conversion in the conventional way"? What "conventional way" are you referring to?
Using your method I got a URIError: URI malformed
6
var arry = [3,5,7,9];
console.log(arry.map(String))

the result will be ['3','5','7','9']

var arry = ['3','5','7','9']
console.log(arry.map(Number))

the result will be [3,5,7,9]

Comments

3

Another solution without decodeURIComponent for characters up to 3 bytes (U+FFFF). The function presumes the string is valid UTF-8, not much error checking...

function atos(arr) {
    for (var i=0, l=arr.length, s='', c; c = arr[i++];)
        s += String.fromCharCode(
            c > 0xdf && c < 0xf0 && i < l-1
                ? (c & 0xf) << 12 | (arr[i++] & 0x3f) << 6 | arr[i++] & 0x3f
            : c > 0x7f && i < l
                ? (c & 0x1f) << 6 | arr[i++] & 0x3f
            : c
        );

    return s
}

1 Comment

I tested this with Chinese, Russian, Hebrew and English, and it works. The code is not very readable, but it's the right approach.
3

Seems the best way these days is the following:

function bufferToString(arr){
    return arr.map(function(i){return String.fromCharCode(i)}).join("")
}

Comments

1

Chinese charset has a different encoding in which one char is more than one byte long. When you do this

for (i in arr){
    str+=String.fromCharCode(arr[i]);
}

You are converting each byte to a char(actually string) and adding it to a string str. What you need to do is, pack the bytes together.

I changed your array to this and it worked for me:

var arr=[20013,25991,99,104,105,110,101,115,101];

I got these codes from here.

you can also take a look at this for packing bytes to a string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.