-1

What is considered good practice to parse a HTML page where the html is inserted via JavaScript? The following page Parcel report when viewing source does not show the table or the table data. My best guess is this is due to the table being inserted via JavaScript. So when this is the case what is a good practice method of scraping this data?

I was hoping to dump the file into a string and print the table using a similar method to this, but im willing to hear any suggestions.

 $html_import = ???
 $html->loadHTML($html_import);
 $td = $html->getElementsByTagname('td');
 foreach($td as $tds) {
 printf(" * %s\n", $tds->textContent);
 echo '<br>';

}

8
  • you can't "scrape" such content, because the JS code will not have executed in PHP. PHP (and DOM) cannot do ANYTHING about js-generated/inserted code. You need to use other means, e.g. a headless webbrowser to simulate an actual browser rendering the page, then extract the modified DOM from that. Commented Jun 25, 2013 at 15:31
  • could you direct me to any source of info ? Right I edited my question a bit. Im hoping to dump the contents of the html into a string and then load the string. Commented Jun 25, 2013 at 15:34
  • stackoverflow.com/questions/6578132/php-headless-browser Commented Jun 25, 2013 at 15:35
  • Also please don't duplicate your own questions. If you changed your mind, edit them and improve them: Parsing HTML tables via DOM Commented Jun 25, 2013 at 15:41
  • That link is nothing at all like what I am asking.... A tribe called READ.....I didnt change my mind this is a different task and problem....read the questions....! ENTIRELY diffrent Commented Jun 25, 2013 at 15:41

1 Answer 1

1

If you look at the HTTP requests being made when the page loads you will see the AJAX request go out.

GET http://gis.catawbacountync.gov/_rest/v0/ws_ims_attribute_query.php?parameters=pinc+%3D+%27374219517154%27&table=ws_parcel_report3&fields=*&orderby=&format=json

That is what is actually retrieving the data you want. If you get access to that API you could easily get the information you want.

Since this is a government website I suspect that they are required to give you this information and thus the API.

Please be sure that you are not violating any Terms of Use if you go about just trying to piece together the API through trial/error.

Sign up to request clarification or add additional context in comments.

1 Comment

This is public data. I skimmed over the site saw nothing beyond dont maliciously attack the site, but i will check again.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.