2

I need a javascript function to automatically login and then scrape some detail from a website. I have the login details saved on a server side database. I need to use these login details and login to a website and then scrape some basic information from there.

I heard that we can do this by dynamic iframing of the urls... and get things done... but I need to know the exact details on how to get this done.

2
  • Are you limited to Javascript? This is possible (and relatively easy with languages that support cURL (PHP, Python, etc.) but I'm not sure even possible with JS. Commented Dec 6, 2011 at 13:01
  • Yes... I am limited to javascript. The reason for it is the fact that we do not want to originate this request from our server IP address. I would ideally want the client IP address to appear as the one that sends out the requests. Commented Dec 7, 2011 at 7:07

1 Answer 1

1

This sounds like a job for a headless browser such as PhantomJS rather than attempting to use CURL and PHP with a javascript front end. It would require you to install some software onto your server though, which is easy, but requires command line access.

PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

PhantomJS is an optimal solution for fast headless testing, site scraping, pages capture, SVG renderer, network monitoring and many other use cases.

I have used it for this very purpose myself before. You can even inject your favourite javascript framework (such as jQuery) into the DOM of the page to make it easier for you to navigate around the elements.

Sign up to request clarification or add additional context in comments.

2 Comments

That sounds like the solution I was looking for. Can we please get in touch and discuss it in detail, if possible.
You said above (after I answered this question): "The reason for it is the fact that we do not want to originate this request from our server IP address. I would ideally want the client IP address to appear as the one that sends out the requests". PhantomJS would have to run on your server and requests would originate from it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.