0

I know RegExp not well, I did not succeeded to split string to array.

I have string like:

<h5>some text in header</h5>
some other content, that belongs to header <p> or <a> or <img> inside.. not important...
<h5>Second text header</h5>

So What I am trying to do is to split text string into array where KEY would be text from header and CONTENT would be all the rest content till the next header like:

array("some text in header" => "some other content, that belongs to header...", ...)

7
  • 2
    Better to use an html-parsing library. Commented Jun 21, 2011 at 21:43
  • dont use regex ; instead use dom parser Commented Jun 21, 2011 at 21:45
  • I agree. What is the purpose of this? DOMDocument will turn the string into an object that you can easily work with. Commented Jun 21, 2011 at 21:45
  • What about embedded headers? h2 in the content of h1? Commented Jun 21, 2011 at 21:46
  • Do you need the HTML tags inside the content, or do you need it only as the text? Commented Jun 21, 2011 at 21:49

3 Answers 3

2

I would suggest looking at the PHP DOM http://php.net/manual/en/book.dom.php. You can read / create DOM from a document.

Sign up to request clarification or add additional context in comments.

4 Comments

Isn`t there fast way to get array with regexp?
you may want to run html tidy on your document before trying to parse it into a DOM
@Martynas It seems like you may want to go about the problem diferrently.
I was just thinking about splitting with regexp, so it would be less work
1

i've used this one and enjoyed it.

http://simplehtmldom.sourceforge.net/

you could do it with a regex as well.

something like this.

/<h5>(.*)<\/h5>(.*)<h5>/s

but this just finds the first situation. you'll have to cut hte string to get the next one.

any way you cut it, i don't see a one liner for you. sorry.

here's a crummy broken 4 liner.

$chunks = explode("<h5>", $html);
foreach($chunks as $chunk){
  list($key, $val) = explode("</h5>", $chunk);
  $res[$key] = $val;
}

Comments

0

dont parse HTML via preg_match instead use php Class

The DOMDocument class

example:

   <?php      
   $html= "<h5>some text in header</h5>
   some other content, that belongs to header <p> or <a> or <img> inside.. not important...
   <h5>Second text header</h5>";
   // a new dom object 
   $dom = new domDocument('1.0', 'utf-8'); 
   // load the html into the object ***/ 
   $dom->loadHTML($html); 
   /*** discard white space ***/ 
   $dom->preserveWhiteSpace = false; 
   $hFive= $dom->getElementsByTagName('h5'); 
   echo $hFive->item(0)->nodeValue; // u can get all h5 data by changing the index
   ?>

Reference

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.