0

I am writing currently a little parser for JSON documents. UTF characters can be represented as \u0628.

How can I turn the string \u0628 into a real Java character?

2 Answers 2

1

you can use a piece of code like this:

String str = myString.split(" ")[0];
str = str.replace("\\","");
String[] arr = str.split("u");
String text = "";
for(int i = 1; i < arr.length; i++){
    int hexVal = Integer.parseInt(arr[i], 16);
    text += (char)hexVal;
}

or you can use Apache Commons Lang:

import org.apache.commons.lang.StringEscapeUtils;

@Test
public void testUnescapeJava() {
    String sJava="\\u0048\\u0065\\u006C\\u006C\\u006F";
    System.out.println("StringEscapeUtils.unescapeJava(sJava):\n" + StringEscapeUtils.unescapeJava(sJava));
}


 output:
 StringEscapeUtils.unescapeJava(sJava):
 Hello
Sign up to request clarification or add additional context in comments.

Comments

1

You can parse the bytes using the UTF_16 charset:

E.g.

byte[] data = {0x06, 0x28};
String string = new String(data, StandardCharsets.UTF_16);

You could find the escapes using a regex

private static Pattern ESCAPE_PATTERN = Pattern.compile("\\\\u([0-9a-fA-F]{2})([0-9a-fA-F]{2})");

public static String replaceCharEscapes(String input) {
    Matcher m = ESCAPE_PATTERN.matcher(input);
    if (!m.find()) {
        return input;
    }
    StringBuilder outputBuilder = new StringBuilder(input.subSequence(0, m.start()));
    int lastEnd = m.end();
    outputBuilder.append(getChar(m));

    while (m.find()) {
        outputBuilder.append(input.subSequence(lastEnd, m.start()))
                .append(getChar(m));
        lastEnd = m.end();
    }

    if (lastEnd != input.length()) {
        outputBuilder.append(input.subSequence(lastEnd, input.length()));
    }

    return outputBuilder.toString();
}

private static String getChar(Matcher m) {
    return new String(new byte[] {
        Byte.parseByte(m.group(1), 16),
        Byte.parseByte(m.group(2), 16),
    });
}

example:

replaceCharEscapes("\\u0043:\\\\u0050\\u0072\\u006f\\u0067\\u0072\\u0061\\u006ds")

returns C:\Programs

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.