I am using node.js to open a list of web pages and parse the HTML contents.
I supply the URLs inside the script as an array, then call request to retrieve the HTML, which I then parse with Cheerio.
The problem I have is that some webpages do not list the URL inside the HTML content.
So I want to determine the URL of the page that I am parsing from within my request callback.
Since request is asynchronous, I cannot rely on the outer loop (loops through the array of URL strings) to get the URL.
Any ideas?
var requestList = [ 'https://blahblah.com', 'https://blah2.com' ];
for (var i = 0; i < (requestList.length); i++) {
request(requestList[i], function (error, response, html) {
if (!error && response.statusCode == 200) {
var $ = cheerio.load(html);
...
// how can i determine the URL of this html body?
Thanks for any suggestions!