I have a big text file that is a sequence of XML-valid documents that looks something like this:
<DOC>
<TEXT> ... </TEXT>
...
</DOC>
<DOC>
<TEXT> ... </TEXT>
...
</DOC>
etc. There is no <?xml version="1.0">, the <DOC></DOC> delimits each separate xml. What's the best way to parse this in Java and get the values under <TEXT> in each <DOC>?
If I pass the whole thing to a DocumentBuilder, I get an error saying the document is not well formed. Is there a better solution than simply traversing through, a building a string for each <DOC>?