I have the following code snippet which essentially parses my blog site and store some information as variables:
global $articles;
$items = $html->find('div[class=blogpost]');
foreach($items as $post) {
$articles[] = array($post->children(0)->innertext,
$post->children(1)->first_child()->outertext);
}
foreach($articles as $item) {
echo $item[0];
echo $item[1];
echo "<br>";
}
The above code outputs as follows:
Title of blog post 1 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/cool_news" id="963" target="_blank" >Click here for news</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 2 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/neato" id="963" target="_blank" >Click here for neato</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 3 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/lame" id="963" target="_blank" >Click here for lame</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
with $item[0] containing "Title of blog post X" and $item[1] containing the rest.
What I want to do is parse $item[1] and retain only the URL contained within it as a separate variable. Perhaps I am not phrasing my question correctly, but I cannot find anything that can help me figure this out.
Can anyone help me?
preg_match("href=\"(.*?)\"si", $source, $match);to get the href value in the string.