0

I'm doing some webscraping, and I found what seemed to be the best tool ever at http://selectorgadget.com/. The problem I'm running into is that when I am using Selenium Web Driver in Java, is that it is finding a different number of results than the selector gadget. I think its an issue with the CSS Selectors being produced. I'm not sure if the problem is with Selenium or with the selector gadget.

Here are 2 different CSS Selectors I'm using, and they are both supposed to produce the same number of results.

Title: #page-content a:nth-child(1) span
Price: td~ td+ td > div:nth-child(1)

Here is my console output, notice the BAD URL:

Category hrefList Initialized
http://www.monoprice.com/Category?c_id=109&cp_id=10910
Titles: 10
Prices: 10
http://www.monoprice.com/Category?c_id=122&cp_id=12212
Titles: 19
Prices: 17
BAD: http://www.monoprice.com/Category?c_id=122&cp_id=12212
http://www.monoprice.com/Category?c_id=117&cp_id=11709
Titles: 4
Prices: 4
http://www.monoprice.com/Category?c_id=109&cp_id=10912
Titles: 2
Prices: 2
http://www.monoprice.com/Category?c_id=117&cp_id=11708
Titles: 9
Prices: 9

I have been attempting to debug this, and from what I can see, the CSS Selectors I have currently are doing the best of what I have tried, but I still have pages that result in missed items. I think the only part that sometimes fails is the price grabbing, but I do need to run a full site test and find it title grabbing fails anywhere. I'm aware that some prices will not be listed because they're out of stock, but the selectorgadget had no issues finding those fields as well, but web driver does.

Are there any restrictions in webdriver causing this, and are there any better tools to be using to generate my css selectors to work better with webdriver?

Edit: Just wanted to mention, I've also previously done work with JSoup. I'm not sure if anybody knows if there's a difference between webdriver and JSoup attempting this task.

2
  • question is just for removing my confusion, cssSelector for title and price u used is not giving u the right count, right? Commented Apr 26, 2016 at 12:19
  • Yes, that's correct. My cssSelector is working on most pages, not all. When I'm using the selector gadget it finds all the results it should. When using selenium web driver it misses some Commented Apr 26, 2016 at 12:24

1 Answer 1

1

for title, use below cssSelector

div.row>a[href$="format=2"] >span 

and for price, use below cssSelector

div.text-red

for Webdriver, use code below

driver.get("http://www.monoprice.com/Category?c_id=122&cp_id=12212");
Thread.sleep(4000);

List<WebElement> elements =  driver.findElements(By.cssSelector("div.row>a[href$='format=2'] >span "));

System.out.println("title size is "+elements.size());

 List<WebElement> elements1 =  driver.findElements(By.cssSelector("div.text-red, div.row>a[href$='notify'] "));

System.out.println("price size is "+elements1.size());

i run this code in my own machine and it returns proper result.

Sign up to request clarification or add additional context in comments.

7 Comments

I will run a test with this and come back with results :)
I just ran a short test and I think this is better in terms of not missing the actual prices... Is there a way to use a || in a cssSelector? Because I could create a 2nd condition for it to meet to grab the TBD prices. Here is an example page link I'm going to later be replacing the missing price with -0.01 in my data
for this , use-- div.row>a[href$="notify"]
I'm running into the same problem with webdriver. When I put div.text-red, div.row>a[href$="notify"] into the selectorgadget, it finds all the proper results. Selenium web driver does not find all of the results. This link for example, Selenium web driver only finds 1 of the notify buttons link
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.