-1

Title says all. How can I get the text between HTML nodes using PHP? Any ideas? Below is my HTML structure.

<html>
<head>
    <title>Test Page</title>
</head>
<body>
    <div id="outer">
        <div id="first">
            <p class="this">Hello</p>
            <p class="this">Community</p>
        </div>
        <div id="second">
            <p class="that">Stack</p>
            <p class="that">Overflow</p>
        </div>
    </div>
</body>

Expected output:

HelloStackOverflowCommunity
5
  • 2
    Use an HTML parser library, like DOMDocument or PHP Simple HTML DOM. Commented Oct 23, 2014 at 8:07
  • It is not good idea to use regexp to parse HTML. Commented Oct 23, 2014 at 8:08
  • Is there anything you have tried? If so could you post the code. Commented Oct 23, 2014 at 8:09
  • stackoverflow.com/questions/17054815/… and you might also want to look at strip_tags() Commented Oct 23, 2014 at 8:13
  • just play with this: php.net/manual/en/class.domdocument.php Commented Oct 23, 2014 at 8:33

5 Answers 5

1

That's quite easy, get PHP Simple HTML DOM Parser here: http://sourceforge.net/projects/simplehtmldom/files/

Then use the following code:

/* include simpledom*/
include('simple_html_dom.php');

/* load html string */
$html_string = <<<HTML
<html>
<head>
    <title>Test Page</title>
</head>
<body>
    <div id="outer">
        <div id="first">
            <p class="this">Hello</p>
            <p class="this">Community</p>
        </div>
        <div id="second">
            <p class="that">Stack</p>
            <p class="that">Overflow</p>
        </div>
    </div>
</body>
</html>
HTML;

/* create simple dom object from html */
$html = str_get_html($html_string);

/* find all paragraph elements */
$paragraph = $html->find('div[id=outer] div p');

/* loop through all elements and get inner text */
foreach($paragraph as $p){
    echo $p->innertext;
}

Cheers,

Roy

Sign up to request clarification or add additional context in comments.

Comments

0

I would recommend you to use PHP inbuilt DOMDocument rather than a third party class like simplehtmldom.

On big HTML files they are really slow (I have worked with them).

<?php
$html ='
<html>
<head>
    <title>Test Page</title>
</head>
<body>
    <div id="outer">
        <div id="first">
            <p class="this">Hello</p>
            <p class="this">Community</p>
        </div>
        <div id="second">
            <p class="that">Stack</p>
            <p class="that">Overflow</p>
        </div>
    </div>
</body>
';

// a new dom object
$dom = new domDocument; 
$dom->preserveWhiteSpace = false;

// load the html into the object
$dom->loadHTML($html); 
// get the body tag
$body = $dom->getElementsByTagName('body')->item(0);
 // loop through all tags
foreach($body->getElementsByTagName('*') as $element ){
    // print the textValue
    print $element->firstChild->textContent;
}

The output would be HelloCommunity StackOverflow

Comments

0

Regular expressions are strongly not recommended to parse HTML.
Use Simple HTML library: http://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/
Include it: include 'simple_html_dom.php';
Get tags you need: $tags = $html->find('p');
Create array: $a = array(); foreach ($tags as $tag) $a[] = $tag->innertext;;
Create your string: $string = $a[0] . $a[2] . $a[3] . $a[1];

Comments

0

You could try:

$text = strip_tags($html);

http://www.php.net/manual/en/function.strip-tags.php

That will get you quite far. It leaves spaces and returns, but those are easy to remove.

$clean = str_replace(array(' ',"\n","\r"),'',$text);

http://www.php.net/manual/en/function.str-replace.php

Using it on your example gives:

TestPageHelloCommunityStackOverflow

If you want to leave some spaces intact you could instead try:

$clean = trim(implode('',explode("\n",$text)));

which results in:

Test Page Hello Community Stack Overflow

Many variations are possible.

Comments

-1

Try this one

function getTextBetweenTags($string, $tagname)
 {
    $pattern = "/<$tagname>(.*?)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
 }

You have to loop through the $matches array...

1 Comment

@jurgemaister damn, beat me to it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.