1

I'm begginer to xpath expression ,

I have below url :

http://www.newark.com/white-rodgers/586-902/contactor-spst-no-12vdc-200a-bracket/dp/35M1913?MER=PPSO_N_P_EverywhereElse_None

which holds html pagecontent,using following xpaths it results same ul element in javascript:

  1. //*[@id="moreStock_5257711"]
  2. //*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul
  3. //html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul

using this xpaths how sholud i get same ul element in java

I have tried using "html cleaner" it failed in xpath -

"//*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul",
"//html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul"

it got worked for "//*[@id='moreStock_5257711']" this xpath. So below code which i have tried in html cleaner

package com.test.htmlcleaner.HtmlCleaner;

import java.io.IOException;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.XPatherException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Test {
 public static void main(String[] args) {

  try {
 HtmlCleaner htmCleaner = new HtmlCleaner();
   CleanerProperties cleanerProperties = htmCleaner.getProperties();
   cleanerProperties.setTranslateSpecialEntities(true);
   cleanerProperties.setTransResCharsToNCR(true);
   cleanerProperties.setOmitComments(true);

   String s = "http://www.newark.com/white-rodgers/586-902/contactor-spst-no-12vdc-200a-bracket/dp/35M1913?MER=PPSO_N_P_EverywhereElse_None"; 
     Document doc = Jsoup.connect(s).timeout(30000).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2").get();

    String pageContent=doc.toString();
    TagNode node = htmCleaner.clean(pageContent);
    Object[] statsNode = node.evaluateXPath("//*[@id='moreStock_5257711']");
    if(statsNode.length > 0) {    
             for(int i=0;i<statsNode.length;i++){
               TagNode resultNode = (TagNode)statsNode[i];
               System.out.println("hi");
                System.out.println("Element Text : " +resultNode.getText().toString().trim());                 
               }
          }
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (XPatherException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }
}

I required all xpaths sholud work with one package in java

Can any one suggest me to get working all xpaths expression of getting ul element using java.

Thanks for advance regards.

2
  • Show us your Java code as well. Commented Feb 25, 2015 at 7:52
  • can u suggest me if html cleaner couldnt handle which package is best get all xpaths to work 1. //*[@id="moreStock_5257711"] 2. //*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul 3. //html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul URL: newark.com/white-rodgers/586-902/… Commented Feb 25, 2015 at 8:17

1 Answer 1

1

Try to debug the actual HTML DOM tree being created by HtmlCleaner. Use the following code:

String pageContent = doc.toString();
TagNode node = htmCleaner.clean(pageContent);

StringWriter buffer = new StringWriter();
node.serialize(new PrettyHtmlSerializer(cleanerProperties), buffer);

System.out.println(buffer.toSting());

Now, try to apply all the XPaths on this buffer output and see why they don't work.

Sign up to request clarification or add additional context in comments.

2 Comments

TagNode bufferContent = htmCleaner.clean( buffer.toString()); Object[] statsNode = bufferContent.evaluateXPath("//html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul"); These two lines i have added after your code . No luck can you share some example with these similar Xpaths
You misunderstood me. My idea was to examine the output of System.out.println(buffer.toSting()); to find out why the XPath is failing. The buffer holds the DOM tree against which your XPath is being evaluated so if a particular XPath is not working that means HtmlCleaner has created a tree that's different than the actual HTML source of the page.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.