25

I have a variable that contains a string consisting of Japanese characters, for instance;

"みどりいろ"

How would I go about converting this to its Javascript escape form?

The result I am after for this example specifically is:

"\u306f\u3044\u3044\u308d"

I'd prefer a jquery approach if there's a variation.

1
  • 1
    @SergeiZahharenko - escape("abc") //"abc"... Commented Jan 9, 2014 at 8:06

6 Answers 6

41
"み".charCodeAt(0).toString(16);

This will give you the unicode (in Hex). You can run it through a loop:

String.prototype.toUnicode = function(){
    var result = "";
    for(var i = 0; i < this.length; i++){
        // Assumption: all characters are < 0xffff
        result += "\\u" + ("000" + this[i].charCodeAt(0).toString(16)).substr(-4);
    }
    return result;
};

"みどりいろ".toUnicode();       //"\u307f\u3069\u308a\u3044\u308d"
"Mi Do Ri I Ro".toUnicode();  //"\u004d\u0069\u0020\u0044\u006f\u0020\u0052\u0069\u0020\u0049\u0020\u0052\u006f"
"Green".toUniCode();          //"\u0047\u0072\u0065\u0065\u006e"

Demo: http://jsfiddle.net/DerekL/X7MCy/

More on: .charCodeAt

Sign up to request clarification or add additional context in comments.

14 Comments

My bad :) For some reason I missed the .toString(16) part
@EladStern - It's okay.
You can replace while(partial.length !== 4) partial = "0" + partial; with ('0000' + partial).substr(-4); which I would prefer :)
@Adassko - Ooo nice idea.
You can also replace your loop with a replace function. Then the whole function will be: return this.replace(/./g, function(c) { return "\\u" + ('000' + c.charCodeAt(0).toString(16)).substr(-4) }); :P
|
12

Above answer is reasonable. A slight space and performance optimization:

function escapeUnicode(str) {
    return str.replace(/[^\0-~]/g, function(ch) {
        return "\\u" + ("000" + ch.charCodeAt().toString(16)).slice(-4);
    });
}

Comments

7

just

escape("みどりいろ")

should meet the needs for most cases, buf if you need it in the form of "\u" instead of "%xx" / "%uxxxx" then you might want to use regular expressions:

escape("みどりいろ").replace(/%/g, '\\').toLowerCase()

escape("みどりいろ").replace(/%u([A-F0-9]{4})|%([A-F0-9]{2})/g, function(_, u, x) { return "\\u" + (u || '00' + x).toLowerCase() });

(toLowerCase is optional to make it look exactly like in the first post)

It doesn't escape characters it doesn't need to in most cases which may be a plus for you; if not - see Derek's answer, or use my version:

'\\u' + "みどりいろ".split('').map(function(t) { return ('000' + t.charCodeAt(0).toString(16)).substr(-4) }).join('\\u');

3 Comments

Upvoted because this works too (only for characters other than latin letters and common punctuation marks.)
Fails for characters in the range U+0000 to U+001F, U+007F to U+00FF plus various punctuation marks. These characters get escaped to %xx instead of %uxxxx, which results in invalid backslash escapes. You would have to do two replacements, one for %u to \u and then one for % to \x. Also the toLowerCase() is superfluous and would lose information for unescaped characters.
does this pass The Pile of Poo Test™ ? :P
1

My version of code, based on previous answers. I use if to convert non UTF8 chars in JSON.stringify().

const toUTF8 = string =>
    string.split('').map(
        ch => !ch.match(/^[^a-z0-9\s\t\r\n_|\\+()!@#$%^&*=?/~`:;'"\[\]\-]+$/i)
            ? ch
            : '\\' + 'u' + '000' + ch.charCodeAt(0).toString(16)
    ).join('');

Usage:

JSON.stringify({key: 'Категория дли импорта'}, (key, value) => {
    if (typeof value === "string") {
        return toUTF8(value);
    }

    return value;
});

Returns JSON:

{"key":"\\u00041a\\u000430\\u000442\\u000435\\u000433\\u00043e\\u000440\\u000438\\u00044f \\u000434\\u00043b\\u000438 \\u000438\\u00043c\\u00043f\\u00043e\\u000440\\u000442\\u000430"}

1 Comment

Those \u sequences make no sense.
0

Just use the encodeURI function:

encodeURI("みどりいろ")
"%E3%81%BF%E3%81%A9%E3%82%8A%E3%81%84%E3%82%8D"

And the other side decode it back:

decodeURI("%E3%81%BF%E3%81%A9%E3%82%8A%E3%81%84%E3%82%8D")
"みどりいろ"

Comments

-1

I have an answer for this question. This function I made worked for me. To encode only the non utf-8 characters to Unicode.

function toUnicode(word){
       let array = word.split("");
       array =  array.map((character)=>{
                if(character.match(/[^a-zA-Z]/g)){
                    let conversion = "000" + character.charCodeAt(0).toString(16)
                    return "\\u" + conversion;
                 }
                 return character;
});
return array.join("")
}

1 Comment

This works for some characters but for "higher" characters like ✓ it doesn't. The code from Adam Leggett below stackoverflow.com/a/40558081/3434804 gets the job done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.