1

I have some code to transform an Excel file to an XML one but when the cell's text contains some special characters, I'm unable to handle then correctly. For example: a cell contains texts like

(Destinataire de flux entrants ou Origine de flux sortants) **==>** trallla 

when tranforming it into xml, I get

(Destinataire de flux entrants ou Origine de flux sortants) **==&gt** trallla  

How can I get around of this problem?

3 Answers 3

4

You do not want '>' to be part of a value in a xml tag as it's a character that denotes the end of a tag. If it's substituted to &gt automatically than be happy it is. Your XML would become unusable otherwise. Typically any parsing of the XML afterwards will know how to handle the &gt part and re-substitute it.

Sign up to request clarification or add additional context in comments.

Comments

1

You can also use CDATA. If this can help you solve your problem.

2 Comments

CDATA is the recommended (and intended) mechanism for potentially "unsafe" character data.
That's true but he only parses the XML which has been generated by Excel. Therefore CDATA isn't a solution as he can not change the creation progress.
0

If you have problems reading esacaped HTML characters you can use Apache commons lang library which includes the method StringEscapeUtils.html.unescapeHtml(..).

The unescaped String is the input you want.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.