1

I am trying to grab the meta data from a news article on the NY Times website, specifically http://www.nytimes.com/2014/06/25/us/politics/thad-cochran-chris-mcdaniel-mississippi-senate-primary.html

Whenever I try however I am getting redirects from the sight because my "browser" does not accept cookies. I have enabled the curl options to save cookies and tried following the accepted answers in a few other StackOverflow questions (here, here, and here) and while the answer worked on those websites it doesn't seem to work on the nytimes site.

My current php curl function looks like this:

function get_extra_meta_tags_curl($url) {
    $ckfile = tempnam("/public_html/commentarium/", "cookies.txt");

    $ch = curl_init($main_url);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
    curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    $output = curl_exec($ch);

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
    curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    $output = curl_exec($ch);
    curl_close($ch);

    echo $output;
}

The problem appears to be that when I request the URL, nytimes.com checks if the browser accepts cookies. I checks a couple of times before redirecting to the login page with a REFUSE_COOKIE_ERROR. Instead of posting the full redirect list here you can see it on my test page here along with the raw html that the final redirect returns and what my current get_extra_meta_tags_curl function is returning under CURL test

Thanks for any help!

1 Answer 1

1

You enable cookies auto-handling in wrong manner. CURLOPT_COOKIEJAR only enables cookies saving (storing), but you need also enable cookies loading and passing them with request (by CURLOPT_COOKIEFILE option). Otherwise cookies auto-handling won't work and you will experienced mentioned "Browser does not accept cookies" problem.

So you have to set both CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options to the same value ($ckfile) at each CURL request:

...
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
...
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, I have tried that as was suggested in one of the other stackoverflow questions but I am still getting the same response. I have updated my question to reflect your suggestion as well as the test page.
@Russell Winkler Read the answer attentively. You have to set both options at each CURL request
I apologize, that was my error in updating the post. I did add the CURLOPT_COOKIEFILE in both requests and have now updated the post appropriately.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.