1

Possible Duplicate:
Java: How to decode HTML character entities in Java like HttpUtility.HtmlDecode?

I have string data with some special characters encoded in this format &#039

in this case that encoding is a ' sign, a single quote.

so example the "the citizen&#039s home" should appear like "the citizen's home" but it does not.

Unfortunately this is not interpreted as such, and I need to parse all of my string for these things and convert them

first: what is that format called, this will help me find a conversion method

second: do you know of a method to fix my strings?

1
  • 1
    This format is called : HTML Entity (in decimal). Commented Jul 9, 2012 at 18:58

1 Answer 1

3

No need to reinvent the wheel: Apache Commons Lang's StringEscapeUtils.unescapeHtml4(String) is what you want.

Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.

For example, the string "&lt;Fran&ccedil;ais&gt;" will become "<Français>"

If an entity is unrecognized, it is left alone, and inserted verbatim into the result string. e.g. "&gt;&zzzz;x" will become ">&zzzz;x".

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.