1

I have an embeddeble code of a slide like below. this whole html is stored in a variable $embed_code.

I am printing this code in PHP. Now I want a piece of code from this HTML string.

The code is written below. I want the code between <object> tag only.

$embed_code = '
 <div style="width:425px" id="__ss_617490"><strong style="display:block;
 margin:12px 0 4px"><a href="http://www.slideshare.net/al.capone/funny-beer-babies-
 enginnering-rev-2-presentation" title="Funny beer babies enginnering rev. 
 2">Funny beer babies enginnering rev. 2</a></strong>


<object id="__sse617490" 
 width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com
/swf/ssplayer2.swf?doc=becoming-an-engineer-1222340701618958-9&stripped_title=funny-  
 beer-babies-enginnering-rev-2-presentation&userName=al.capone" /><param  
 name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/>
 <embed name="__sse617490" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=
  becoming-an-engineer-1222340701618958-9&stripped_title=funny-beer-babies-enginnering-
  rev-2-presentation& userName=al.capone" type="application/x-shockwave-flash" 
   allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed> 
  </object>




 <div style="padding:5px 0  12px">View more<a href="http://www.slideshare.net
  /"> presentations</a> from <a href="http://www.slideshare.net/al.capone">
  al.capone</a>.</div></div>';

Now I want this string only from <object id="....." to "</embed> </object> this whole HTML is generated dynamically so give me any idea for this.

How can I do this? Is there any PHP function that can extract html between of any tag?

4
  • You can use a regexp or a dom parser Commented Nov 8, 2011 at 16:22
  • 2
    @soju: I'd +1 for suggesting a dom parser, but there's no way to -99999999 for suggesting regexes. So... +0 it is. Commented Nov 8, 2011 at 16:29
  • 1
    Well, in this particular case, a simple regexp is enough Commented Nov 8, 2011 at 16:31
  • 1
    HTML markup and "simple regex" are mutually exclusive terms! Commented Nov 8, 2011 at 16:37

3 Answers 3

1

Use the DOMDocument classes.

$dom = new DomdDocument ();
$dom -> loadHtml ($embed_code);
$htmlObject = $dom -> getElementById ('__sse617490'); // Returns a DomElement

http://www.php.net/dom

Sign up to request clarification or add additional context in comments.

6 Comments

+1; PHPQuery, which I mentioned in my answer simply wraps this with a nicer (in my opinion) API.
but i said that this html is generated dynamically so the id of div will changed at every new slide.
In that case you need some way of consistantly identifying the <object> for every slide. If the <object> on the page is the only object tag then you can simply use getElementsByTagName(). If not, then you'll need to modify the code that generates the markup to make it possibly to make the object distinct from all other markup on the page, perhaps by adding a class.
@rajzana You want $dom->getElementByTagName('object');. See: php.net/manual/en/domdocument.getelementsbytagname.php
@GordonM He appears to be scraping Slideshare so I don't think he can change the markup.
|
1

I like using PHPQuery to parse and extract data from HTML with PHP. It uses jQuerys simple CSS style selectors for traversing the code.

So it would be:

require('phpQuery/phpQuery.php');
$doc = phpQuery::newDocumentHTML($embed_code);
$div = pq('div#__ss_617490'); // select a DIV with the specified ID
var_dump($div->attr('style')); //To get the style attribute
var_dump($div->html()); // To get the inner html

// now to get the object tag like you desire.
$object_tag = pq('object');

// only get the first object
$object_tag = pq('object:first');

Comments

-1

You could just use a regex to parse and extract it:

$embed_code = "blah blah <object ...>and other code here</object> blah blah";

$matches = array();
preg_match('#<object(\s*[^>])?>(.*)</object>#iU', $embed_code, $matches);

// $matches[0] = "<object ...>and other code here</object>"
// $matches[1] = "and other code here"

7 Comments

As discussed by @MarcB 8 minutes agao regex isn't the best or cleanest solution to an HTML parsing problem.
@Treffynnon This depends on the context - sometimes creating a whole DOM structure in memory just to extract part of the text it contains is overkill and a regex is more efficient.
Clarity is of more importance. Memory is cheap. Time wasted debugging code is expensive.
Memory may be of importance in some circumstances. And what about processing time? Sometimes a regex will be quicker than a DOM parser. Again, it all boils down to context and considerations such as the level of control over the input (user/system generated, always well-formed?) should be taken into account. Hence why my post says "could" not "should".
To be clear I did not down vote this answer. Someone else must feel even more strongly than me!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.