0

I am attempting to do data scraping with php but the url I need to access requires post data.

<?php 

//set POST variables
$url = 'https://www.ncaa.org/';
//$url = 'https://web3.ncaa.org/hsportal/exec/hsAction?hsActionSubmit=searchHighSchool';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

?>

When I tried accessing the second $url where the actual information is hosted it returns failed to load response data, but It will allow me to access the ncaa home page. Is there a reason why I get a failed to load response data even though I am sending the correct form data?

2
  • By seeing your second URL it seems it require GET method not POST Commented Jul 4, 2016 at 5:06
  • If you view the website and type in the high school code there that I have an example of it sends it though post Commented Jul 4, 2016 at 5:07

2 Answers 2

1

The site apparently checks for a recognized user agent. By default PHP curl doesn't send a User-Agent header. Add

curl_setopt($curl, CURLOPT_USERAGENT, 'curl/7.21.4');

and the script returns a response. However, in this case, the response says that it requires a newer browser than the one you have. So you should copy the user agent string from a real browser, e.g.

curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');

Also, it requires the parameters to be sent in application/x-www-form-urlencoded format. When you use an array as the argument to CURLOPT_POSTFIELDS it uses multipart/form-data. So change that line to:

curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));

to convert the array to a URL-encoded string.

And in the URL, leave out ?hsActionSubmit=searchHighSchool, as that parameter is sent in the POST fields.

The final, working script looks like this:

<?php
//set POST variables
//$url = 'https://www.ncaa.org/';
$url = 'https://web3.ncaa.org/hsportal/exec/hsAction';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));
curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');
//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much. This works exactly as I need it to. This will help my research about php cURL.
Sorry to be a pain but now the script is not working?
Why would this be if nothing has changed?
0

curl HTTPS connections needs to turn off specical option. CURLOPT_SSL_VERIFYPEER

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// ** This option MUST BE FALSE **
**curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);**

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.