3

I need to do server side web scraping/navigation, including sites with JavaScript, and I need a solution that would work on a hosting plan - I dont have my own server. I came across python/pyside/pyqt4 - this would work perfectly/allow me to navigate sites like a headless browser. However I don't know if this would be possible to install on a remote server/host...

1 Answer 1

2

If you need a headless browser, you should check out PhantomJS, and in particular PyPhantomJS, the Python implementation. These might work in a shared hosting context - it really depends on the host. See the build instructions for different platforms - you'd likely need to ask your hosting provider to install.

If you can get this running, you might be interested in checking out pjscrape (disclaimer: this is my project). It's a command-line tool using PhantomJS to allow scraping using JavaScript and jQuery in a full browser context.

Sign up to request clarification or add additional context in comments.

4 Comments

Do you know if there might be any solutions that are implemented in python or ruby or php... Something I could just upload to my hosting space?
Also I think HTMLUnitwould probably do the job well... This is in java... do you know of any web hosts with java support?
Also, how does your pyscrape work client side if the same origin policy prevents JavaScript on one domain from accessing data on another?
1) PyPhantomJS is implemented in Python, as I noted in my answer. It involves Webkit, though, so I don't know if installation would be as simple as uploading it. 2) Pjscrape runs through PhantomJS, so it's not really "client-side" - it injects JS into the current page context.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.