Find xpath of an element in a html page content using java

Question

I'm begginer to xpath expression ,

I have below url :

http://www.newark.com/white-rodgers/586-902/contactor-spst-no-12vdc-200a-bracket/dp/35M1913?MER=PPSO_N_P_EverywhereElse_None

which holds html pagecontent,using following xpaths it results same ul element in javascript:

//*[@id="moreStock_5257711"]
//*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul
//html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul

using this xpaths how sholud i get same ul element in java

I have tried using "html cleaner" it failed in xpath -

"//*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul",
"//html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul"

it got worked for "//*[@id='moreStock_5257711']" this xpath. So below code which i have tried in html cleaner

package com.test.htmlcleaner.HtmlCleaner;

import java.io.IOException;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.XPatherException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Test {
 public static void main(String[] args) {

  try {
 HtmlCleaner htmCleaner = new HtmlCleaner();
   CleanerProperties cleanerProperties = htmCleaner.getProperties();
   cleanerProperties.setTranslateSpecialEntities(true);
   cleanerProperties.setTransResCharsToNCR(true);
   cleanerProperties.setOmitComments(true);

   String s = "http://www.newark.com/white-rodgers/586-902/contactor-spst-no-12vdc-200a-bracket/dp/35M1913?MER=PPSO_N_P_EverywhereElse_None"; 
     Document doc = Jsoup.connect(s).timeout(30000).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2").get();

    String pageContent=doc.toString();
    TagNode node = htmCleaner.clean(pageContent);
    Object[] statsNode = node.evaluateXPath("//*[@id='moreStock_5257711']");
    if(statsNode.length > 0) {    
             for(int i=0;i<statsNode.length;i++){
               TagNode resultNode = (TagNode)statsNode[i];
               System.out.println("hi");
                System.out.println("Element Text : " +resultNode.getText().toString().trim());                 
               }
          }
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (XPatherException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }
}

I required all xpaths sholud work with one package in java

Can any one suggest me to get working all xpaths expression of getting ul element using java.

Thanks for advance regards.

can u suggest me if html cleaner couldnt handle which package is best get all xpaths to work 1. //*[@id="moreStock_5257711"] 2. //*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul 3. //html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul URL: newark.com/white-rodgers/586-902/… — SaKol
– SaKol, Commented Feb 25, 2015 at 8:17

Ravi K Thapliyal · Accepted Answer · 2015-02-25 08:48:04Z

1

Try to debug the actual HTML DOM tree being created by HtmlCleaner. Use the following code:

String pageContent = doc.toString();
TagNode node = htmCleaner.clean(pageContent);

StringWriter buffer = new StringWriter();
node.serialize(new PrettyHtmlSerializer(cleanerProperties), buffer);

System.out.println(buffer.toSting());

Now, try to apply all the XPaths on this buffer output and see why they don't work.

answered Feb 25, 2015 at 8:48

Ravi K Thapliyal

51.9k9 gold badges80 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SaKol Over a year ago

TagNode bufferContent = htmCleaner.clean( buffer.toString()); Object[] statsNode = bufferContent.evaluateXPath("//html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul"); These two lines i have added after your code . No luck can you share some example with these similar Xpaths

Ravi K Thapliyal Over a year ago

You misunderstood me. My idea was to examine the output of System.out.println(buffer.toSting()); to find out why the XPath is failing. The buffer holds the DOM tree against which your XPath is being evaluated so if a particular XPath is not working that means HtmlCleaner has created a tree that's different than the actual HTML source of the page.

Collectives™ on Stack Overflow

Find xpath of an element in a html page content using java

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related