Build Stripped HTML Array from String in PHP

Question

I have a String which looks something like this:

$html_string = "<p>Some content</p><p>separated by</p><p>paragraphs</p>"

I'd like to do some parsing on the content inside the tags, so I think that creating an array from this would be easiest. Currently I'm using a series of explode and implode to achieve what I want:

$stripped = explode('<p>', $html_string);
$joined = implode(' ', $stripped);
$parsed = explode('</p>', $joined);

which in effect gives:

array('Some content', 'separated by', 'paragraphs');

Is there a better, more robust way to create an array from HTML tags? Looking at the docs, I didn't see any mention of parsing via a regular expression.

Thanks for your help!

DOMDocument is the best way to parse HTML, but there is also php.net/manual/en/function.preg-split.php for regex exploding — rjdown
– rjdown, Commented Aug 12, 2016 at 20:39

Manuel Mannhardt · Accepted Answer · 2016-08-12 20:43:50Z

1

If its only that simple with no/not much other tags inside the content you can simply use regex for that:

$string = '<p>Some content</p><p>separated by</p><p>paragraphs</p>';

preg_match_all('/<p>([^<]*?)<\/p>/mi', $string, $matches);

var_dump($matches[1]);

which creates this output:

array(3) {
  [0]=>
  string(12) "Some content"
  [1]=>
  string(12) "separated by"
  [2]=>
  string(10) "paragraphs"
}

Keep in mind that this is not the most effective way nor is it the fastest, but its shorter then using DOMDocument or anything like that.

answered Aug 12, 2016 at 20:43

Manuel Mannhardt

2,2011 gold badge19 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BarakD · Accepted Answer · 2016-08-12 20:42:58Z

0

If you need to do some html parsing in php, there is a nice library for that, called php html parser. https://github.com/paquettg/php-html-parser which can give you a jquery like api, to parse html.

an example:

// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->load('<p>Some content</p><p>separated by</p><p>paragraphs</p>');
$pTags = $dom->find('p');
foreach ($pTags as $tag)
{    
    // do something with the html
    $content = $tag->innerHtml;

 }

answered Aug 12, 2016 at 20:42

BarakD

5481 gold badge8 silver badges18 bronze badges

Comments

trincot · Accepted Answer · 2016-08-12 21:11:27Z

Here is the DOMDocument solution (native PHP), which will also work when your p tags have attributes, or contain other tags like <br>, or have lots of white-space in between them (which is irrelevant in HTML rendering), or contain HTML entities like   or <, etc, etc:

$html_string = "<p>Some content</p><p>separated by</p><p>paragraphs</p>";
$doc = new DOMDocument();
$doc->loadHTML($html_string);

foreach($doc->getElementsByTagName('p') as $p ) {
    $paras[] = $p->textContent;
}

// Output array:
print_r($paras);

If you really want to stick with regular expressions, then at least allow tag attributes and HTML entities, translating the latter to their corresponding characters:

$html_string = "<p>Some content &amp; text</p><p>separated&nbsp;by</p><p style='background:yellow'>paragraphs</p>";

preg_match_all('/<p(?:\s.*?)?>\s*(.*?)\s*<\/p\s*>/si', $html_string, $matches);

$paras = $matches[1];
array_walk($paras, 'html_entity_decode');

print_r($paras);

Collectives™ on Stack Overflow

Build Stripped HTML Array from String in PHP

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related