2

I am getting following encoded html as a json response and has no idea how to decode it to normal html string, which is an achor tag by the way.

x3ca hrefx3dx22http:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx22x3ehttp:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx3c\/ax3e

I have tried java.net.UrlDecoder.decode without anyluck.

2
  • That's not JSON at all. Where is this data coming from that is claiming it is JSON? Commented Sep 23, 2010 at 6:05
  • here is the actual JSON response [{"type":"text","text":"Resentment - B\x27Day is the second studio album by American R\x26B singer Beyoncé Knowles, released September 4, 2006, on Columbia Records in collaboration with Music World Music and Sony Urban Music. Its release coincided with Knowles\x27 twenty-fifth birthday. ...","language":"en"},{"type":"url","text":"\x3ca href\x3d\x22http://en.wikipedia.org/wiki/Resentment_(song)\x22\x3ehttp://en.wikipedia.org/wiki/Resentment_(song)\x3c/a\x3e","language":"en"}] Commented Sep 23, 2010 at 6:13

4 Answers 4

7

The term you search for are "UTF8 Code Units". These Code units are basically a backslash, followed by a "x" and a hex ascii code. I wrote a little converter method for you:

public static String convertUTF8Units(String input) {
    String part = "", output = input;
    for(int i=0;i<=input.length()-4;i++) {
        part = input.substring(i, i+4);
        if(part.startsWith("\\x")) {
            byte[] rawByte = new byte[1];
            rawByte[0] = (byte) (Integer.parseInt(part.substring(2), 16) & 0x000000FF);
            String raw = new String(rawByte);
            output = output.replace(part, raw);
        }
    }

    return output;
}

I know, its a bit frowzy, but it works :)

Sign up to request clarification or add additional context in comments.

2 Comments

thanks Keenora, but I already did it using regular expression
I needed it for PowerShell and I could not get it converted in a fast way, then I found a way simpler method here: stackoverflow.com/a/49344121/2964949
1

That's not an encoding I've seen before, but it looks like xYZ (where Y and Z are hex digits [0-9a-f]) means "the character whose ascii code is 0xYZ". I'm not sure how the letter x itself would be encoded, so I would recommend trying to find out. But then you can just do a find and replace on the regex x([0-9a-f]{2}), by getting the integer represented by the two hex numbers, and then casting it to a char (or something similar to that).

Then also, it looks like slashes (and other characters? See if you can find out...) always have a backslash in front of them, so do another find-and-replace for that.

2 Comments

You should also try to figure out how unicode characters above ff would be represented, and be sure to modify your approach accordingly.
i faced same problem in retrieving rarbic json data in this link facebook.com/feeds/page.php?id=103622369714881&format=json can y tell me please what did you do ??
1

Thanks!!

Take care, in the for the operator must be "<=" else one character can't be decoded.

for(int i=0;i<=input.length()-4;i++) {..}

Cheers!

Comments

-2

This works for me

    public static String convertUTF8Units_version2(String input) throws UnsupportedEncodingException
    {
         return URLDecoder.decode(input.replaceAll("\\\\x", "%"),"UTF-8");
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.