0

Assuming an HTML string like this:

$str = '<p>Hello World!</p><p style="text-align:center"><img src="foo.png" /><br /></p>';

It is possible to convert it into an array like this?

[0] => '<p>Hello World!</p>'
[1] => '<p style="text-align:center">'
[2] => '<img src="foo.png" />'
[3] => '<br />'
[4] => '</p>'

I tried using DOMDocument many different ways but the problem seems to always boil down to parenting. I need to traverse the HTML without regard for parent/child relationships.

7
  • hi again greener, is str formatted like that, you copy pasted ? Commented May 12, 2016 at 1:38
  • According to an earlier comment HTML is all one line. Commented May 12, 2016 at 1:40
  • 1
    @chris85 so,I don't see the point of pasting it like that... Commented May 12, 2016 at 1:44
  • 1
    Yea, I figured OP would update to actual format. Commented May 12, 2016 at 1:55
  • @greener you aren't helping us to help you. roger out. Commented May 12, 2016 at 2:02

4 Answers 4

1

@olibiaz' Answer will do..

Just wanted to show another way of doing this using preg_split.

$str = '<p>Hello World!</p><p style="text-align:center"><img src="foo.png" /><br /></p>';
$flags = PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY;
$regex = '/(<[a-z0-9=\-:." ^\/]+\/>)|(<[^\/]+>[^<\/]+<\/[a-z0-9]+>)|(<[a-z0-9=\-:." ^\/]+>)/';
$parts = preg_split( $regex, $str, -1, $flags);

OUTPUT:

array (size=5)
    0 => string '<p>Hello World!</p>' (length=19)
    1 => string '<p style="text-align:center">' (length=29)
    2 => string '<img src="foo.png" />' (length=21)
    3 => string '<br />' (length=6)
    4 => string '</p>' (length=4)
Sign up to request clarification or add additional context in comments.

Comments

1

Do you want to use a PHP DOM extension for this? Alternatively, you could simple explode on newlines as follows:

var_dump(explode("\n", $html));

Which results in:

Array
(
    [0] => <p>Hello World!</p>
    [1] => <p style="text-align:center">
    [2] =>   <img src="foo.png" />
    [3] =>   <br />
    [4] => </p>
)

3 Comments

the HTML string actually has no line breaks :/
@greener the string in your opening post looks like it has newlines though, is that literally what the string looks like?
no the string is one single line. I just posted it like that for clarity but evidently could have been better
0

You can use regex in order to achieve that.

$input = '<p>Hello World!</p><p style="text-align:center"><img src="foo.png" /><br /></p>';
$regex = '/(<[a-z0-9=\-:." ^\/]+\/>)|(<[^\/]+>[^<\/]+<\/[a-z0-9]+>)|(<[a-z0-9=\-:." ^\/]+>)/';


$result = []; 
preg_match_all($regex, $input, $result);

$result = $result[0];

$result will look like

array(5) {
  [0] =>
  string(19) "<p>Hello World!</p>"
  [1] =>
  string(29) "<p style="text-align:center">"
  [2] =>
  string(21) "<img src="foo.png" />"
  [3] =>
  string(6) "<br />"
  [4] =>
  string(4) "</p>"
}

But two important thing:

  • For sure this regex pattern can be improved, its more like an example.
  • Test it on different cases since i made a test on your particular example input and it could failed on more complex structure. In this case adapt it to your needs.

Comments

0

No regex solution:

$str = '<p>Hello World!</p><p style="text-align:center"><img src="foo.png" /><br /></p>';
$tags = explode( '|', str_replace('><', '>|<', $str));
print_r($tags);

Output:

Array
(
    [0] => <p>Hello World!</p>
    [1] => <p style="text-align:center">
    [2] => <img src="foo.png" />
    [3] => <br />
    [4] => </p>
)

Ideone Demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.