1

I know I can use xpath, but in this case it wouldn't work because of the complexity of the navigation of the site.

I can only use the source code.

I have browsed all over the place and couldn't find a simple php solution that would:

  1. Open the HTML source code page (I already have an exact source code page URL).
  2. Select and extract the text between two codes. Not between a div. But I know the start and end variables.

So, basically, I need to extract the text between

knownhtmlcodestart> Text to extract <knownhtmlcodeend

What I'm trying to achieve in the end is this:

  1. Go to a source code URL.
  2. Extract the text between two codes.
  3. Store the data temporarily (define the time manually for how long) on my web server in a simple text file.
  4. Define the waiting time and then repeat the whole process again.

The website that I'm going to extract data from is changing dynamically. So it would always store new data into the same file.

Then I would use that data (but that's a question for another time).

I would appreciate it if anyone could lead me to a simple solution.

Not asking to write a code, but maybe someone did anything similar and sharing the code here would be helpful.

Thanks

1

2 Answers 2

1

I (shamefully) found the following function useful to extract stuff from HTML. Regexes sometimes are too complex to extract large stuff, e.g. a whole <table>

/*
   $start - string marking the start of the sequence you want to extract
   $end - string marking the end of it..
   $offset - starting position in case you need to find multiple occurrences
   returns the string between `$start` and `$end`, and the indexes of start and end
*/
function strExt($str, $start, $end = null, $offset = 0)
{
    $p1 = mb_strpos($str,$start,$offset);
    if ($p1 === false) return false;
    $p1 += mb_strlen($start);

    $p2 = $end === null ? mb_strlen($str) : mb_strpos($str,$end, $p1+1);
    return 
        [
            'str'   => mb_substr($str, $p1, $p2-$p1),
            'start' => $p1,
            'end'   => $p2];
}
Sign up to request clarification or add additional context in comments.

Comments

1

This would assume the opening and closing tag are on the same line (as in your example). If the tags can be on separate lines, it wouldn't be difficult to adapt this.

$html = file_get_contents('website.com');


$lines = explode("\n", $html); 

foreach($lines as $word) {
    $t1 = strpos($word, "knownhtmlcodestart");
    $t2 = strpos($word, "knownhtmlcodeend");
    
    if ($t1)
        $c1 = $t1;
    
    if ($t2)
        $c2 = $t2;
    
    if ($c1 && $c2){
        $text = substring($word, $c1, $c2-$c1);
        break;  
    }
}

echo $text;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.