0

I am using the following NCAA stats site and want to scrape data from it:

https://stats.ncaa.org/rankings/change_sport_year_div

To get to the specific data I want to scrape, click the link, choose the sport Men's Basketball, year 2019-2020, and Division III, then click the team stats button. After digging through the html, I was able to find all of the stats from the drop down menu that I want. I was wondering if there was potentially a way to use BeautifulSoup (or perhaps even pd.read_html()) to scrape a table for each category. It appears as though all the info I may need is in the picture below, but I'm not quite sure how to implement python's tools to capitalize. This would be way more efficient (and a lot less boring) to do than manually downloading the excel sheet for each stat and reading them into pandas. Thank you.

enter image description here

1 Answer 1

1

Inspecting your case, you should make a post request to the given url with some form data as follows:

sport_code: MBB
academic_year: 2020.0
division: 3.0
ranking_period: 110.0
team_individual: T
game_high: N
ranking_summary: N

sport_code=MBB&academic_year=2020.0&division=3.0&ranking_period=110.0&team_individual=T&game_high=N&ranking_summary=N

Format the form data as shown above and call curl,

curl -X POST -d "@formdata.txt" https://stats.ncaa.org/rankings/change_sport_year_div

If you please you could also do the same thing with requests module, just make sure form data is in correct format.

r = requests.post("https://stats.ncaa.org/rankings/change_sport_year_div",
                  data={"sport_code": "MBB",
                        "academic_year": 2020.0,
                        "division": 3.0,
                        "ranking_period": 110.0,
                        "team_individual": "T",
                        "game_high": "N",
                        "ranking_summary": "N"})
Sign up to request clarification or add additional context in comments.

7 Comments

How did you find a token and those parameters? I can't seem to find an API/API documentation for this site.
There is no API in these kind of websites, open chrome devtools and look at the network tab. Click the type:document, there is Form Data under Headers.
I'm kind of confused on how to format the code in your original response. What do I do with all of those variables? I was able to find them on my machine and have my own token. It is listed as a dictionary. I have imported requests already.
Also, the dictionary won't run because the key 'stat_seq' doesn't have a value pair.
When I run the following code I get an error r = requests.post('<stats.ncaa.org/rankings/change_sport_year_div>', data = {'sport_code': 'MBB', 'academic_year': 2020.0, 'division': 3.0, 'ranking_period': 110.0, 'team_individual': 'T', 'game_high': 'N', 'ranking_summary': 'N', 'org_id': -1, 'stat_seq': '', 'conf_id': -1, 'region_id': -1, 'ncaa_custom_rank_summary_id': -1, 'user_custom_rank_summary_id': -1, 'authenticity_token': '1qX+xH/PoudepD8bA4ZV+3sObi98u2rI59KoqHH1B00='}) InvalidSchema: No connection adapters were found
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.