Extract a particular part of a large HTML code block stored in PHP variable

Question

I have an embeddeble code of a slide like below. this whole html is stored in a variable $embed_code.

I am printing this code in PHP. Now I want a piece of code from this HTML string.

The code is written below. I want the code between <object> tag only.

$embed_code = '
 <div style="width:425px" id="__ss_617490"><strong style="display:block;
 margin:12px 0 4px"><a href="http://www.slideshare.net/al.capone/funny-beer-babies-
 enginnering-rev-2-presentation" title="Funny beer babies enginnering rev. 
 2">Funny beer babies enginnering rev. 2</a></strong>


<object id="__sse617490" 
 width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com
/swf/ssplayer2.swf?doc=becoming-an-engineer-1222340701618958-9&stripped_title=funny-  
 beer-babies-enginnering-rev-2-presentation&userName=al.capone" /><param  
 name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/>
 <embed name="__sse617490" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=
  becoming-an-engineer-1222340701618958-9&stripped_title=funny-beer-babies-enginnering-
  rev-2-presentation& userName=al.capone" type="application/x-shockwave-flash" 
   allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed> 
  </object>




 <div style="padding:5px 0  12px">View more<a href="http://www.slideshare.net
  /"> presentations</a> from <a href="http://www.slideshare.net/al.capone">
  al.capone</a>.</div></div>';

Now I want this string only from <object id="....." to "</embed> </object> this whole HTML is generated dynamically so give me any idea for this.

How can I do this? Is there any PHP function that can extract html between of any tag?

@soju: I'd +1 for suggesting a dom parser, but there's no way to -99999999 for suggesting regexes. So... +0 it is. — Marc B
– Marc B, Commented Nov 8, 2011 at 16:29
HTML markup and "simple regex" are mutually exclusive terms! — GordonM
– GordonM, Commented Nov 8, 2011 at 16:37

GordonM · Accepted Answer · 2011-11-08 16:36:13Z

1

Use the DOMDocument classes.

$dom = new DomdDocument ();
$dom -> loadHtml ($embed_code);
$htmlObject = $dom -> getElementById ('__sse617490'); // Returns a DomElement

http://www.php.net/dom

answered Nov 8, 2011 at 16:36

GordonM

31.9k17 gold badges94 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Treffynnon Over a year ago

+1; PHPQuery, which I mentioned in my answer simply wraps this with a nicer (in my opinion) API.

Manish Jangir Over a year ago

but i said that this html is generated dynamically so the id of div will changed at every new slide.

GordonM Over a year ago

In that case you need some way of consistantly identifying the <object> for every slide. If the <object> on the page is the only object tag then you can simply use getElementsByTagName(). If not, then you'll need to modify the code that generates the markup to make it possibly to make the object distinct from all other markup on the page, perhaps by adding a class.

Treffynnon Over a year ago

@rajzana You want $dom->getElementByTagName('object');. See: php.net/manual/en/domdocument.getelementsbytagname.php

Treffynnon Over a year ago

@GordonM He appears to be scraping Slideshare so I don't think he can change the markup.

|

Treffynnon · Accepted Answer · 2011-11-08 16:46:21Z

1

I like using PHPQuery to parse and extract data from HTML with PHP. It uses jQuerys simple CSS style selectors for traversing the code.

So it would be:

require('phpQuery/phpQuery.php');
$doc = phpQuery::newDocumentHTML($embed_code);
$div = pq('div#__ss_617490'); // select a DIV with the specified ID
var_dump($div->attr('style')); //To get the style attribute
var_dump($div->html()); // To get the inner html

// now to get the object tag like you desire.
$object_tag = pq('object');

// only get the first object
$object_tag = pq('object:first');

edited Nov 8, 2011 at 16:46

answered Nov 8, 2011 at 16:22

Treffynnon

21.6k6 gold badges68 silver badges99 bronze badges

Comments

daiscog · Accepted Answer · 2011-11-08 16:37:07Z

-1

You could just use a regex to parse and extract it:

$embed_code = "blah blah <object ...>and other code here</object> blah blah";

$matches = array();
preg_match('#<object(\s*[^>])?>(.*)</object>#iU', $embed_code, $matches);

// $matches[0] = "<object ...>and other code here</object>"
// $matches[1] = "and other code here"

answered Nov 8, 2011 at 16:37

daiscog

12.2k8 gold badges55 silver badges63 bronze badges

7 Comments

Treffynnon Over a year ago

As discussed by @MarcB 8 minutes agao regex isn't the best or cleanest solution to an HTML parsing problem.

daiscog Over a year ago

@Treffynnon This depends on the context - sometimes creating a whole DOM structure in memory just to extract part of the text it contains is overkill and a regex is more efficient.

Treffynnon Over a year ago

Clarity is of more importance. Memory is cheap. Time wasted debugging code is expensive.

daiscog Over a year ago

Memory may be of importance in some circumstances. And what about processing time? Sometimes a regex will be quicker than a DOM parser. Again, it all boils down to context and considerations such as the level of control over the input (user/system generated, always well-formed?) should be taken into account. Hence why my post says "could" not "should".

Treffynnon Over a year ago

To be clear I did not down vote this answer. Someone else must feel even more strongly than me!

|

Collectives™ on Stack Overflow

Extract a particular part of a large HTML code block stored in PHP variable

3 Answers 3

6 Comments

Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related