0

I need to perform a series of tests on a url. The first test is a word count, I have that working perfectly and the code is below:

if (isset($_GET[article_url])){
    $title = 'This is an example title';
    $str = @file_get_contents($_GET[article_url]);
    $test1 = str_word_count(strip_tags(strtolower($str)));
    if($test1 === FALSE) { $test = '0'; }
    if ($test1 > '550') {
        echo '<div><i class="fa fa-check-square-o" style="color:green"></i> This article has '.$test1.' words.';
    } else {
        echo '<div><i class="fa fa-times-circle-o" style="color:red"></i> This article has '.$test1.' words. You are required to have a minimum of 500 words.';
    }       
}

Next I need to get all h1 and h2 tags from $str and test them to see if any contain the text $title and echo yes if so and no if not. I am not really sure how to go about doing this.

I am looking for a pure php means of doing this without installing php libraries or third party functions.

1 Answer 1

1

please try below code.

if (isset($_GET[article_url])){
    $title = 'This is an example title';
    $str = @file_get_contents($_GET[article_url]);

    $document = new DOMDocument();
    $document->loadHTML($str);

    $tags = array ('h1', 'h2');
    $texts = array ();
    foreach($tags as $tag)
    {
      //Fetch all the tags with text from the dom matched with passed tags
      $elementList = $document->getElementsByTagName($tag);
      foreach($elementList as $element)
      {
         //Store text in array from dom for tags
         $texts[] = strtolower($element->textContent);
      }
    }
    //Check passed title is inside texts array or not using php
    if(in_array(strtolower($title),$texts)){
        echo "yes";
    }else{
        echo "no";
    }
}
Sign up to request clarification or add additional context in comments.

7 Comments

Unfortunately this spits out countless Dom errors such as Warning: DOMDocument::loadHTML(): Tag header invalid in Entity, line: 146 It also always returns no, even when it should return yes.
Can you please place print_r($texts) before if condition and check the output.
the output looks like this: Array ( [h1] => Array ( [0] => the startup business success plan ) [h2] => Array ( [0] => about us [1] => your board [2] => the services [3] => membership fees [4] => partners [5] => startmeeting® [6] => entrepreneur power hour [7] => the downliner [8] => adsactly hits [9] => contact us ) ) no. And no is output even if I use the text "contact us".
I have tried with rtrim ltrim, trim all just return false 100% of the time.
There character encoding issue when getting string using file_get_content. When there are special charters in a string then string doesn't match correctly and it very strange. To sort it out I have tried strip_tags,html_entity_decode,htmlspecialcharacters_decode.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.