1

I have a string which was encoded by UTF-16. When parsing using javax.xml.parsers.DocumentBuilder, I got an error like this:

Character reference "&#x0" is an invalid XML character

Here is the code I used to parse the XML:

InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlString));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
org.w3c.dom.Document document = parser.parse(inputSource);

My question is, how to replace the invalid characters by (space)?

3
  • 4
    You must do this before you parse the XML. Commented Aug 3, 2012 at 14:13
  • I know that I must do this before parsing, but the question is how to do? Commented Aug 3, 2012 at 14:18
  • 1
    check this answer from another stackoverflow thread: stackoverflow.com/a/4237934/405117 Commented Aug 3, 2012 at 14:18

3 Answers 3

1

You just need to use String.replaceAll and pass the pattern of invalid characters.

Sign up to request clarification or add additional context in comments.

1 Comment

My xmlString is something like that: <?xml version="1.0" encoding="utf-16"?> <ITEM version="1.0"> <PROP NAME="cont">This is my content &#x3;&#x4;&#x14;&#x0;&#x8;&#x0;&#x8;&#x0;</PROP> </ITEM> What is the pattern? Thanks
0

You are trying to parse an invalid xml entity and this is what raising exception. It seems you need not to worry about UTF-16 for your situation.

Find some explanation and example here.

As an example, it is not possible to use & character for a valid xml, we need to use &amp; instead. Here &amp; is the xml entity.

Assuming above example should be self explanatory to understand what xml entity is.

As I understand there are some xml entity which is not valid. But no worry again. it is possible to declare & add new xml entity. Take a look at the above article for more detail.


EDIT: Assuming there is & character making the xml invalid.

Comments

0

StringEscapeUtils()

escapeXml

public static void escapeXml(java.io.Writer writer,
                             java.lang.String str)
                      throws java.io.IOException

Escapes the characters in a String using XML entities.

For example: "bread" & "butter" => &quot;bread&quot; &amp; &quot;butter&quot;.

Supports only the five basic XML entities (gt, lt, quot, amp, apos). 
Does not support DTDs or external entities.

Note that unicode characters greater than 0x7f are currently escaped to their 
numerical \\u equivalent. This may change in future releases.

Parameters:
    writer - the writer receiving the unescaped string, not null
    str - the String to escape, may be null 
Throws:
    java.lang.IllegalArgumentException - if the writer is null 
    java.io.IOException - if there is a problem writing
See Also:
    unescapeXml(java.lang.String)

1 Comment

This function is deprecated. It is replaced by either escapeXml10 or escapeXml11. Note that these functions also filter the invalid characters. Also note that this doesn't solve the OPs question. The DOM API already escapes the predefined entities.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.