0

I want to write a function in php which can extract all the text withing given html string and tag.

something like this:

function signature

function HTMLTextExtrator(htmlString, tagName)
return type : string

Example:

$string=<body><div>this is a <p>text</p> to be extracted</div></body>    
echo  HTMLTextExtrator($string,p); // output: text    
echo  HTMLTextExtrator($string,div); // output: this is a <p>text</p> to be extracted    
echo  HTMLTextExtrator($string,body); // output:<div>this is a <p>text</p> to be extracted</div>

If anyone know what code can be inside the function to execute this....thanx

1

2 Answers 2

1

You can try this function to see if it gives you what you want

<?php

/**
 *
 * @get text between tags
 *
 * @param string $tag The tag name
 *
 * @param string $html The XML or XHTML string
 *
 * @param int $strict Whether to use strict mode
 *
 * @return array
 *
 */
function getTextBetweenTags($tag, $html, $strict=0)
{
    /*** a new dom object ***/
    $dom = new domDocument;

    /*** load the html into the object ***/
    if($strict==1)
    {
        $dom->loadXML($html);
    }
    else
    {
        $dom->loadHTML($html);
    }

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the tag by its tag name ***/
    $content = $dom->getElementsByTagname($tag);

    /*** the array to return ***/
    $out = array();
    foreach ($content as $item)
    {
        /*** add node value to the out array ***/
        $out[] = $item->nodeValue;
    }
    /*** return the results ***/
    return $out;
}
?>

A sample usage scenario:

<?php

$xhtml = '<html>
<body>
<para>This is a paragraph</para>
<para>This is another paragraph</para>
</body>
</html>';

$content2 = getTextBetweenTags('para', $xhtml, 1);
foreach( $content2 as $item )
{
    echo $item.'<br />';
}
?>
Sign up to request clarification or add additional context in comments.

2 Comments

Modify the function and it will include the tags you specify. What you have above should be of great assistance to your work I believe
If you will find it hard to tweak then you can simply use PHP Simple HTML DOM Parser from simplehtmldom.sourceforge.net
0

Use strip_tags() for this. But the passed tag will not be stripped, rest will be stripped.

echo strip_tags('<p>hello</p> <div>World</div>', '<p></p>');

The output will be - <p>hello</p> World

5 Comments

it is only showing the hello World in the output without any tags
why it is not showing on my localhost then?? i wrote the same code but no success
check for the tags. they have to be same.
if you output to browser and echo it Browser will not show you tags.
@sgtBOSE it does not produce same out as OP wants

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.