Remove part of string from array value in PHP

Question

I have the following code snippet which essentially parses my blog site and store some information as variables:

global $articles;

$items = $html->find('div[class=blogpost]'); 

foreach($items as $post) {
    $articles[] = array($post->children(0)->innertext,
                        $post->children(1)->first_child()->outertext);
}

foreach($articles as $item) {
    echo $item[0]; 
    echo $item[1];
    echo "<br>";
}

The above code outputs as follows:

Title of blog post 1 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/cool_news" id="963"  target="_blank" >Click here for news</a> &nbsp;<img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 2 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/neato" id="963"  target="_blank" >Click here for neato</a> &nbsp;<img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 3 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/lame" id="963"  target="_blank" >Click here for lame</a> &nbsp;<img src="/news.gif" width="12" height="12" title="validated" /><span class="title">

with $item[0] containing "Title of blog post X" and $item[1] containing the rest.

What I want to do is parse $item[1] and retain only the URL contained within it as a separate variable. Perhaps I am not phrasing my question correctly, but I cannot find anything that can help me figure this out.

Can anyone help me?

Use a preg_match with something like preg_match("href=\"(.*?)\"si", $source, $match); to get the href value in the string. — adeneo
– adeneo, Commented Dec 21, 2012 at 20:00
You're already parsing the HTML with a proper parser. You want to continue parsing on the <a> tag. You're doing it the right way now. Don't resort to regular expressions! — Andy Lester
– Andy Lester, Commented Dec 21, 2012 at 20:21
Thing is I dont know how to further parse it. The parser I am using doesn't appear to be supported any longer: net.tutsplus.com/tutorials/php/… — Sweepster
– Sweepster, Commented Dec 21, 2012 at 20:25

Adam Elsodaney · Accepted Answer · 2012-12-21 20:35:12Z

2

If you were to parse $item[1] into whatever DOM crawler object you were using for $html, you could use the following XPath

$item[1]->find('//a[0]/@href');

which will return

href="http://www.example.com/cool_news"

Then extract the url however you want, with PHP or refine the XPath query. Not sure what the XPath would be to get the value, perhaps someone might be able to expand on that one.

EDIT: Seeing as you using Simple DOM Parser, try the following

$blogItemHtml = new simple_html_dom();
$blogItemHtml->load($item[1]);

$anchors = $blogItemHtml->find('a');
echo $anchors[0]->href; // "http://www.example.com/cool_news"

edited Dec 21, 2012 at 20:35

answered Dec 21, 2012 at 20:16

Adam Elsodaney

7,8286 gold badges41 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sweepster Over a year ago

This is the parser I am using: net.tutsplus.com/tutorials/php/…

Adam Elsodaney Over a year ago

@Jonathan I've made an edit to my answer, hopefully should help

Collectives™ on Stack Overflow

Remove part of string from array value in PHP

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related