I have a complex HTML string similar to:
some text <blockquote>main text<blockquote>quotation</blockquote>end of main text</blockquote> some other text
Using PHP I want to extract the entire content of the first blockquote, even if that includes other blockquotes:
main text<blockquote>quotation</blockquote>end of main text
The difficult part is I need to stop cutting the string at the right closing tag - the one belonging to the first opening tag (in this example, the last - but this must be dynamically determined).
This is the attempt I have so far:
<?php
$some_html = "<blockquote>main text<blockquote>quotation</blockquote>end of main text</blockquote>";
$result = get_first_element_of_HTML_tag_name($some_html,'blockquote');
function get_first_element_of_HTML_tag_name($html_string,$tag_name) {
$h = strtolower($html_string);
$tag_open = "<" . $tag_name . ">";
$tag_close = "</" . $tag_name . ">";
$element_start = strpos($h,$tag_open)+strlen($tag_open);
$element_end = strpos($h,$tag_close);
$element = substr($h,$element_start,$element_end); // cut to first closing tag
$element_s = $element;
$i = 2;
while ( strpos($element_s,"<blockquote") !== false ) { // as long as substring contains another opening tag
// include another closing tag in the result
$element = substr($h,$element_start,nth_strpos($h,$element_end,$i));
$element_s = substr( $element_s, strpos($element_s,$tag_open)+strlen($tag_open), nth_strpos($element_s,strpos($element_s,$tag_close),$i));
$i++;
}
return $hs; // return complete first element with $tag_name
}
function nth_strpos($str, $substr, $n) {
$ct = 0;
$pos = 0;
while ( ( $pos = strpos($str, $substr, $pos) ) !== false ) {
if (++$ct == $n) {
return $pos;
}
$pos++;
}
return false;
}
php?>
$result is returning blank...
It's stuck somewhere in the nth_strpos function, I think.
Help or even simpler alternatives much appreciated!