Java DOM parse html in xml node

Question

i have a parser here:

package lt.prasom.functions;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.UnsupportedEncodingException;
import java.util.Properties;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import android.annotation.TargetApi;
import android.media.MediaRecorder.OutputFormat;
import android.util.Log;

public class XMLParser {

    // constructor
    public XMLParser() {

    }

    /**
     * Getting XML from URL making HTTP request
     * @param url string
     * */
    public String getXmlFromUrl(String url) {
        String xml = null;

        try {
            // defaultHttpClient
            DefaultHttpClient httpClient = new DefaultHttpClient();
            HttpGet httpGet = new HttpGet(url);

            HttpResponse httpResponse = httpClient.execute(httpGet);
            HttpEntity httpEntity = httpResponse.getEntity();
            xml = EntityUtils.toString(httpEntity);

        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        // return XML
        return xml;
    }

    /**
     * Getting XML DOM element
     * @param XML string
     * */
    public Document getDomElement(String xml){
        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setValidating(false);

        try {

            DocumentBuilder db = dbf.newDocumentBuilder();

            InputSource is = new InputSource();
                is.setCharacterStream(new StringReader(xml));
                doc = db.parse(is); 

            } catch (ParserConfigurationException e) {
                Log.e("Error: ", e.getMessage());
                return null;
            } catch (SAXException e) {
                Log.e("Error: ", e.getMessage());
                return null;
            } catch (IOException e) {
                Log.e("Error: ", e.getMessage());
                return null;
            }

            return doc;
    }

    /** Getting node value
      * @param elem element
      */
     @TargetApi(8)
    public final String getElementValue( Node elem , boolean html) {
         Node child;
         if( elem != null){
             if (elem.hasChildNodes()){
                 for( child = elem.getFirstChild(); child != null; child = child.getNextSibling() ){
                     if( child.getNodeType() == Node.TEXT_NODE  ){


                         //return child.getNodeValue();
                         return child.getNodeValue();
                     }
                 }
             }
         }
         return "";
     }

     /**
      * Getting node value
      * @param Element node
      * @param key string
      * */

     public String getValue(Element item, String str) {     
            NodeList n = item.getElementsByTagName(str);    

            return this.getElementValue(n.item(0), false);
        }

}

And there's my sample xml :

<items>
<item>
<name>test</name>
<description>yes <b>no</b></description>
</item>
</items>

When i parse description i'm getting everything to tag ("yes"). So i want to parse raw data in description tag. I tried CDATA tag didin't worked. Is it any way without encoding xml?

Thanks!

What happens if you parse <description><![CDATA[yes <b>no</b>]]></description>? You must either escape < symbols, or put everything into CDATA section to get your description parsed as text. — DRCB
– DRCB, Commented Sep 27, 2012 at 13:47
is <b> supposed to represent bold? Or is it some arbitrary tag? — Woot4Moo
– Woot4Moo, Commented Sep 27, 2012 at 13:49
Your getElementValue() method returns when it sees the first text node. If you want all text, call getTextContent() on the "description" node. — parsifal
– parsifal, Commented Sep 27, 2012 at 13:49
And if that doesn't answer your question, explain exactly what "didn't worked" means. Tell us what you expect for output. — parsifal
– parsifal, Commented Sep 27, 2012 at 13:51

Jayson Lorenzen · Accepted Answer · 2012-09-27 15:21:24Z

I agree with the comments about this question not being complete, or specific enough for a direct answer (like modifying your source to work etc.), but I had to do something somewhat similar (I think) and can add this. It might help.

So if, IF, the content of the "description" element were valid XML all by itself, so say the document actually looked like:

<items>
  <item>
   <name>test</name>
   <description><span>yes <b>no</b></span></description>
  </item>
</items>

then you could hack out the content of the "description" element as a new XML Document and then get the XML text form that which would look then like:

<span>yes <b>no</b></span>

So a method something like:

/**
 * Get the Description as a new XML document
 *
 */
public Document retrieveDescriptionAsDocument(Document sourceDocument) {

Document document;
Node tmpNode;
Document document2 = null;

try {
    // get the description node, I am just using XPath here as it is easy
    // to read, you already have a reference to the node so just continue as you
    // were doing for that, bottom line is to get a reference to the node
    tmpNode = org.apache.xpath.XPathAPI.selectSingleNode(sourceDocument,"/items/item/description");

    if (tmpNode != null) {

        // create a new empty document
        document2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
        // associate the node with the original document
        sourceDocument.importNode(tmpNode, true);
        // create a document fragment from the original document
        DocumentFragment df = sourceDocument.createDocumentFragment();
        // append the node you found, to the fragment   
        df.appendChild(tmpNode);
        // create the Node to append to the new DOM
        Node importNode = document2.importNode(df,true);
        // append the fragment (as a node) to the new empty document
        Document2.appendChild(importNode);
    }
    else {
        // LOG WARNING
        yourLoggerOrWhatever.warn("retrieveContainedDocument: No data found for XPath:" + xpathP);
    }

    } catch (Exception e) {
        // LOG ERROR
        yourLoggerOrWhatever.error("Exception caught getting contained document:",e);
    }

    // return the new doc, and the caller can then output that new document, that will now just contain "<span>yes <b>no</b></span>" as text, apply an XSL or whatever
    return document2;
}

Collectives™ on Stack Overflow

Java DOM parse html in xml node

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related