3

I have an XML file, sample.xml, that contains the following:

<Tokens>
   <Token>Hello&nbsp;World</Token>
</Tokens>

I want to parse it - but get errors when it gets to the NBSP

I do not have access to the schema for the XML I am using (the one that defines Token or Tokens).

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.parse("sample.xml");

Since I do not have the Schema for my XML document, I was wondering if there is a way to have it completely ignore the HTML special characters while parsing?

4 Answers 4

3

In XML, &nbsp; is an entity reference, but an undefined one, unless you provide a definition. You cannot make an XML parser ignore them, but you can define them, e.g. starting your document with

<!DOCTYPE Tokens [<!ENTITY nbsp "&#xa0;">]>

However, this is probably not useful if you are generating the XML file. You might just as well generate a document containing the real character “ ” U+00A0 NO-BREAK SPACE, or the character reference &#xa0; or its decimal equivalent &#160;.

Cf. to question How do I define HTML entity references inside a valid XML document?

Sign up to request clarification or add additional context in comments.

Comments

0

What you ask for is impossible because to parse ask XML the entity must have a definition somewhere. To parse it as other than XML you need to write your own parser, or use a tolerant parser. XML is not tag soup.

Comments

0

XML doesn’t support &nbsp, although XHTML does. Check the predefined entities in XML list

The solution is to use the Unicode non-breaking space character &#160 while building XML; instead. In some cases a plain space works too (&#32;). Before parsing the XML you can try to replace &nbsp with a ' '-space though.

Comments

0

I agree with Reedwald. But as a workaround you can read the file as string and replace the   with spaces before parsing the document.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.