0

I'm trying to get part of a html text. I'm trying to get each separate list contained under the h3 tags and the images at the bottom.Here is the sample text:

Any help would be great.

Thank you

<h3>Item Summary</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam dictum adipiscing nulla. Aenean id leo non urna sollicitudin lobortis. Sed malesuada diam ut elit accumsan auctor. Proin nisl orci, tempor sed pulvinar ut, semper id nisl. Quisque pellentesque porta facilisis. Duis vestibulum pellentesque commodo. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nulla facilisi. Etiam eget lacus mauris, non accumsan erat. Etiam gravida posuere sollicitudin. Cras id sodales diam. </p>
<h3>Item Features</h3>
<ul>
<li>Feature 1.</li>
<li>Feature 2.</li>
<li>Feature 3.</li>
<li>Feature 4.  </li>
<li>Feature 5.</li>
</ul>
<h3>Item Details</h3>
<ul>
<li>Detail 1</li>
<li>Detail 2</li>
<li>Detail 3</li>
<li>Detail 4</li>
<li>Detail 5</li>
</ul>
<h3>Contact Information</h3>
<ul>
<li>Contact 1</li>
<li>Contact 2</li>
<li>Contact 3</li>
<li>Contact 4</li>

</ul>
<p >
   <img height="100px" src="http://www.mydomain.com/Images/123456.jpg" width="200px"/>
</p>
<p >
   <img height="100px" src="http://www.mydomain.com/Images/123456.jpg" width="200px"/>
</p>
<p >
   <img height="100px" src="http://www.mydomain.com/Images/123456.jpg" width="200px"/>
</p>
<p >
   <img height="100px" src="http://www.mydomain.com/Images/123456.jpg" width="200px"/>
</p>
<p >
   <img height="100px" src="http://www.mydomain.com/Images/123456.jpg" width="200px"/>
</p>

      <img alt="img1" src="000.jpg"/>
3
  • 1
    Hi, Would you mind to elaborate. Not clear what you are trying to do. Commented Feb 25, 2013 at 6:07
  • Are you extracting data from someone else's page? Commented Feb 25, 2013 at 6:10
  • 2
    This sounds like a job for xpath rather than regex. Commented Feb 25, 2013 at 6:12

1 Answer 1

5

Dont use a regex, use a DOM parser like DOMDocument or SimpleXMLElement.

$dom = new DOMDocument();
$dom->loadHTML($yourHTML);

$finder = new DOMXPath($dom);

//Get all lists:
$lists = $finder->query('//ul');

// get all lists immediately AFTER h3's
$listsAfterHeader = $finder->query('//h3/following-sibling::ul[position()=1]');
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.