This is my first post and I'm sorry if I'm doing it wrong but here we go:
I've been working on a project that should scrape values from a website. The values are variables in a javascript array. I'm using the PHP Simple HTML DOM and it works with the normal scripts but not the one stored in CDATA-blocks. Therefore, I'm looking for a way to scrape data within the CDATA-block. Unfortunately, all the help I could find was for XML-files and I'm scraping from a HTML file.
The javascript I'm trying to scrape is a follows:
<script type="text/javascript">
//<![CDATA[
var data = [{"value":8.41,"color":"1C5A0D","text":"17/11"},{"value":9.86,"color":"1C5A0D","text":"18/11"},{"value":7.72,"color":"1C5A0D","text":"19/11"},{"value":9.42,"color":"1C5A0D","text":"20/11"}];
//]]>
</script>
What I need to scrape is the "value"-variable in the var data.
The problem was that I tried to replace the CDATA string on an object. The following code works perfectly :-)
include('simple_html_dom.php');
$lines = file_get_contents('http://www.virtualmanager.com/players/7793477-danijel-pavliuk/training');
$lines = str_replace("//<![CDATA[","",$lines);
$lines = str_replace("//]]>","",$lines);
$html = str_get_html($lines);
foreach($html->find('script') as $element) {
echo $element->innertext;
}
I will provide you with more information if needed.
//<![CDATA[and//]]>constructs. They're completely pointless and have been for years.$html->find('script')even find anything?str_replace()on a HTML DOM object? What I meant is, download the HTML into a string (usingfile_get_contents()or curl), then search-and-replace that string, and only then parse that string into HTML, usingstr_get_html()instead offile_get_html().