0

I am trying to access and then print (or just be able to use) the source code of any website using PHP. I am not very experienced and am now thinking I might need to use JS to accomplish this. So far, the code below accesses the source code of a web page and displays the web page... What I want it to do instead is display the source code. Essentially, and most importantly, I want to be able to store the source code in some sort of variable so I can use it later. And eventually read it line-by-line - but this can be tackled later.

$url = 'http://www.google.com';
function get_data($url) 
{
    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
echo get_data($url); //print and echo do the same thing in this scenario.
2
  • You can use document.body.innerHTML, pretty simple, isn't it? Commented Dec 27, 2012 at 22:35
  • Can you explain this further? Maybe give an example? @gdoron Commented Dec 27, 2012 at 23:13

5 Answers 5

2

Consider using file_get_contents() instead of curl. You can then display the code on your page by replacing every opening bracket (<) with &lt; and then outputting it to the page.

<?php
$code = file_get_contents('http://www.google.com');
$code = str_replace('<', '&lt;', $code);
echo $code;
?>

Edit:
Looks like curl is actually faster than FGC, so ignore that suggestion. The rest of my post still stands. :)

Sign up to request clarification or add additional context in comments.

3 Comments

Set the content-type as plain-text?
+1 I did not know cURL was faster than file_get_contents for remote connections.
@SenorAmor It prints out 'Resource id #1' and that's it!
1

You should try to print the result between <pre></pre> tags;

echo '<pre>' . get_data($url) . '</pre>';

2 Comments

This didn't seem to work. It only changed some of the fonts and formatting of the displayed web page.
maybe echo '<pre>'; print_r(get_data($url)); echo '<pre>';
1

I rewrote your function. The function can return the source with lines or without lines.

<?php 
function get_data($url, $Addlines = false){
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    $content = curl_exec($ch);
    $content = htmlspecialchars($content); // Prevents the browser to parse the html

    curl_close($ch);

    if ($Addlines == true){
        $content = explode("\n", $content);
        $Count = 0;
        foreach ($content as $Line){
            $lines = $lines .= 'Line '.$Count.': '.$Line.'<br />';
            $Count++;
        }
        return $lines;
    } else {
        $content = nl2br($content);
        return $content;
    }
}


echo get_data('https://www.google.com/', true); // Source code with lines
echo get_data('https://www.google.com/'); // Source code without lines
?>

Hope it gets you on your way.

Comments

0

Add a header Content-Type: text/plain

header("Content-Type: plain/text"); 

Comments

0

Use htmlspecialchars() in php to print the source code.

In your code, use

return htmlspecialchars($data);

instead of

return $data;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.