I'm doing some webscraping, and I found what seemed to be the best tool ever at http://selectorgadget.com/. The problem I'm running into is that when I am using Selenium Web Driver in Java, is that it is finding a different number of results than the selector gadget. I think its an issue with the CSS Selectors being produced. I'm not sure if the problem is with Selenium or with the selector gadget.
Here are 2 different CSS Selectors I'm using, and they are both supposed to produce the same number of results.
Title: #page-content a:nth-child(1) span
Price: td~ td+ td > div:nth-child(1)
Here is my console output, notice the BAD URL:
Category hrefList Initialized
http://www.monoprice.com/Category?c_id=109&cp_id=10910
Titles: 10
Prices: 10
http://www.monoprice.com/Category?c_id=122&cp_id=12212
Titles: 19
Prices: 17
BAD: http://www.monoprice.com/Category?c_id=122&cp_id=12212
http://www.monoprice.com/Category?c_id=117&cp_id=11709
Titles: 4
Prices: 4
http://www.monoprice.com/Category?c_id=109&cp_id=10912
Titles: 2
Prices: 2
http://www.monoprice.com/Category?c_id=117&cp_id=11708
Titles: 9
Prices: 9
I have been attempting to debug this, and from what I can see, the CSS Selectors I have currently are doing the best of what I have tried, but I still have pages that result in missed items. I think the only part that sometimes fails is the price grabbing, but I do need to run a full site test and find it title grabbing fails anywhere. I'm aware that some prices will not be listed because they're out of stock, but the selectorgadget had no issues finding those fields as well, but web driver does.
Are there any restrictions in webdriver causing this, and are there any better tools to be using to generate my css selectors to work better with webdriver?
Edit: Just wanted to mention, I've also previously done work with JSoup. I'm not sure if anybody knows if there's a difference between webdriver and JSoup attempting this task.