guys.
I spent quite some time trying to understand if it's a bug or my own lack of education. Basically, I'm trying to react on specific element and read its contents with Transformer using Java StAX API.
Everything works when XML is pretty formatted or has spaces between elements. However, as soon as it sees an XML with no whitespace characters between elements it breaks badly.
There's code and its output to illustrate the problem.
There are 3 sample XMLs and first 2 show 2 different break scenarios while last one shows proper processing:
In the first scenario with no spaces it skips some elements. In the example below it skips all but one "node" element. In the real world scenario though it skips every other node instead. Probably because of richer node content.
In the second scenario I added space between node elements only. As you can see it fails to handle end of the document properly.
In the last scenario I added space between last node and closing root element. Processing went as desired.
In my real world scenario I expect single-line-no-separators XML, so I need the scenario 1 to work properly and would also be happy to know that a valid change to XML such as adding a space between elements would not break the processing like in scenario 2.
Please help!!!
Complete code for single class application test.StAXTest:
package test;
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
public class StAXTest {
private final static String XML1 = "<root><node></node><node></node></root>";
private final static String XML2 = "<root><node></node> <node></node></root>";
private final static String XML3 = "<root><node></node> <node></node> </root>";
public static void main(String[] args) throws Exception {
processXML(XML1);
processXML(XML2);
processXML(XML3);
}
private static void processXML(String xml) {
try {
System.out.println("XML Input:\n" + xml + "\nProcessing:");
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader reader = xif.createXMLStreamReader(new StringReader(xml));
TransformerFactory tf = TransformerFactory.newInstance();
int nodeCount = 0;
while (reader.nextTag() == XMLStreamConstants.START_ELEMENT) {
String localName = reader.getLocalName();
if (localName.equals("node")) {
Transformer t = tf.newTransformer();
StringWriter st = new StringWriter();
t.transform(new StAXSource(reader), new StreamResult(st));
String xmlNode = st.toString();
System.out.println(nodeCount + ": " + xmlNode);
nodeCount++;
}
}
} catch (Throwable t) {
t.printStackTrace(System.out);
}
System.out.println("------------------------------------------------");
}
}
Application output, which contains all 3 scenarios. Please note, that in the first scenario transformed XML portion contains 1 node, not 2. So the second node is completely "lost in translation".
XML Input:
<root><node></node><node></node></root>
Processing:
0: <?xml version="1.0" encoding="UTF-8"?><node/>
------------------------------------------------
XML Input:
<root><node></node> <node></node></root>
Processing:
0: <?xml version="1.0" encoding="UTF-8"?><node/>
1: <?xml version="1.0" encoding="UTF-8"?><node/>
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[-1,-1]
Message: found: END_DOCUMENT, expected START_ELEMENT or END_ELEMENT
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.nextTag(XMLStreamReaderImpl.java:1247)
at com.newedge.test.StAXTest.processXML(StAXTest.java:35)
at com.newedge.test.StAXTest.main(StAXTest.java:21)
------------------------------------------------
XML Input:
<root><node></node> <node></node> </root>
Processing:
0: <?xml version="1.0" encoding="UTF-8"?><node/>
1: <?xml version="1.0" encoding="UTF-8"?><node/>
------------------------------------------------
<root><node></node>is kinda a valid xml fragment but it doesn't know what to do with<node></node></root>. I think maybe you need to specify character encoding and specify valid xml strings that start with proper<?xml ...>, or something similar, perhaps?