2

Here is my code :

function get_data($url)
{
$ch = curl_init();
$timeout = 15;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

//Grab HTML
$urllist = fopen("links.txt", "r+");
for ($j = 0; $j <= 50; $j++)
{
$post = rtrim(fgets($urllist));
echo $post;
$html = get_data($post);
echo $html;

Problem : when I use get_data("http://url.com") I get the right data in html. But when I pass the url using a variable, $html returns nothing.

$post holds the right url as I checked it. Isnt it the right way to use get_data($post);

Curl info gives :

I get this :

array(20) { 
["url"]=> string(68) "http://secret-url.com" 
["content_type"]=> string(9) "text/html" 
["http_code"]=> int(301) 
["header_size"]=> int(255) 
["request_size"]=> int(340) 
["filetime"]=> int(-1) 
["ssl_verify_result"]=> int(0) 
["redirect_count"]=> int(0) 
["total_time"]=> float(0.095589) 
["namelookup_time"]=> float(0.012224) 
["connect_time"]=> float(0.049399) 
["pretransfer_time"]=> float(6.5E-5) 
["size_upload"]=> float(0) 
["size_download"]=> float(0) 
["speed_download"]=> float(0) 
["speed_upload"]=> float(0) 
["download_content_length"]=> float(0) 
["upload_content_length"]=> float(0) 
["starttransfer_time"]=> float(0.095534) 
["redirect_time"]=> float(0) 
}
4
  • The problem then is that your variable must be set to null. The function doesn't care if you pass a string, or a variable with a string. It looks like you're code might be fetching an empty line at the end, you should make sure that rtrim(fgets($urllist)) actually returns a non-empty string. Commented May 2, 2012 at 18:36
  • rtrim(fgets($urllist)) returns the a url correctly. Commented May 2, 2012 at 18:37
  • "But when I pass the url using a variable, it returns nothing." that means that whatever variable you are using is either empty - or the cURL request failed. Commented May 2, 2012 at 18:39
  • I am sure the variable holds the right string/url. And it is even echod to check if it holds data. I tried the same string in the get_data function and it works. I have no clue whats wrong, and thats why I posted it in SO. :) Commented May 2, 2012 at 18:41

2 Answers 2

2

Try this code out.

function get_data($url)
{
    $ch = curl_init();
    $timeout = 15;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());

    // Edit: Follow redirects
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 

    $data = curl_exec($ch);
    var_dump(curl_getinfo($ch));
    curl_close($ch);
    return $data;
}

//Grab HTML
$urllist = fopen("links.txt", "r+");
for ($j = 0; $j <= 50; $j++)
{
    if($post = rtrim(fgets($urllist)))
    {
        echo $post;
        echo get_data($post);
    }
    else
    {
        echo "No URL provided!";
    }

    echo "\n<hr>\n";
}
Sign up to request clarification or add additional context in comments.

6 Comments

You could simply tell me to check if the variable isnt Null. Well, it isnt. `file_get_contents($post) works just fine. but when I pass it to the function, it doesnt work.
@Kishor, checking the fgets line is only one change. I just added that to be safe. The other is getting cURL to tell you why your requests are failing.
Added the output in the question. Please check.
The cURL HTTP error code ["http_code"]=> int(301) is a 301. You also need to use CURLOPT_MAXREDIRS with a limit.
its the redirect indeed: ` curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); `
|
0

will $html = file_get_contents($url); suffice? As records show, it didnt =)

EDIT to sum up conversation with a legitimate answer;

Change your curl into following, containing the FOLLOWLOCATION directive, optionally constrain curl with MAXREDIRS

function get_data($url) {
    $ch = curl_init();
    $timeout = 15;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    // follow Location: newurl.tld - i.e. HTTP 30X status codes
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    // give up follow location if circular after 5 tries
    curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());
    $data = curl_exec($ch);
    // if in doubt, whats going on, look through info of curl_getinfo($ch);
    // var_dump(curl_getinfo($ch));
    curl_close($ch);
    return $data;
}
//Grab HTML
$urllist = fopen("links.txt", "r+");
for ($j = 0; $j <= 50; $j++) {
    $post = rtrim(fgets($urllist));
    echo $post;
    $html = get_data($post);
    echo $html;
}

Optionally, since it seems like youre doing this more then once, returning to youre links.txt pages - set a cookie-container, which allows for the visits to know you have been there before - and reusing that information on consecutive runs

// filehandle, writeable by httpd user:
$cookie_file = "/tmp/cookie/cookie1.txt";
// set request to parse cookies and send them with corresponding host requests
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
// set response cookies to be saved
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);

3 Comments

That works just fine. But I need to use Curl on this one. file_get_content($post) works just fine. so the variable is giving the right url to the function in get_data($url) What could be wrong then? any idea?
there's absolutely nothing wrong with your php script, problem must come from something other then the posted code or on your server, run debug with : $info = curl_getinfo($ch); var_dump($info); right after your curl_exec();, b4 closing it
Added the output in the question. Please check.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.