Scraping Multiple Data Tables at once in Python

Question

I am using the following NCAA stats site and want to scrape data from it:

https://stats.ncaa.org/rankings/change_sport_year_div

To get to the specific data I want to scrape, click the link, choose the sport Men's Basketball, year 2019-2020, and Division III, then click the team stats button. After digging through the html, I was able to find all of the stats from the drop down menu that I want. I was wondering if there was potentially a way to use BeautifulSoup (or perhaps even pd.read_html()) to scrape a table for each category. It appears as though all the info I may need is in the picture below, but I'm not quite sure how to implement python's tools to capitalize. This would be way more efficient (and a lot less boring) to do than manually downloading the excel sheet for each stat and reading them into pandas. Thank you.

BcK · Accepted Answer · 2020-04-11 22:38:36Z

1

Inspecting your case, you should make a post request to the given url with some form data as follows:

sport_code: MBB
academic_year: 2020.0
division: 3.0
ranking_period: 110.0
team_individual: T
game_high: N
ranking_summary: N

sport_code=MBB&academic_year=2020.0&division=3.0&ranking_period=110.0&team_individual=T&game_high=N&ranking_summary=N

Format the form data as shown above and call curl,

curl -X POST -d "@formdata.txt" https://stats.ncaa.org/rankings/change_sport_year_div

If you please you could also do the same thing with requests module, just make sure form data is in correct format.

r = requests.post("https://stats.ncaa.org/rankings/change_sport_year_div",
                  data={"sport_code": "MBB",
                        "academic_year": 2020.0,
                        "division": 3.0,
                        "ranking_period": 110.0,
                        "team_individual": "T",
                        "game_high": "N",
                        "ranking_summary": "N"})

edited Apr 11, 2020 at 22:38

answered Apr 11, 2020 at 21:47

BcK

2,8311 gold badge17 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

bismo Over a year ago

How did you find a token and those parameters? I can't seem to find an API/API documentation for this site.

BcK Over a year ago

There is no API in these kind of websites, open chrome devtools and look at the network tab. Click the type:document, there is Form Data under Headers.

bismo Over a year ago

I'm kind of confused on how to format the code in your original response. What do I do with all of those variables? I was able to find them on my machine and have my own token. It is listed as a dictionary. I have imported requests already.

bismo Over a year ago

Also, the dictionary won't run because the key 'stat_seq' doesn't have a value pair.

bismo Over a year ago

When I run the following code I get an error r = requests.post('<stats.ncaa.org/rankings/change_sport_year_div>', data = {'sport_code': 'MBB', 'academic_year': 2020.0, 'division': 3.0, 'ranking_period': 110.0, 'team_individual': 'T', 'game_high': 'N', 'ranking_summary': 'N', 'org_id': -1, 'stat_seq': '', 'conf_id': -1, 'region_id': -1, 'ncaa_custom_rank_summary_id': -1, 'user_custom_rank_summary_id': -1, 'authenticity_token': '1qX+xH/PoudepD8bA4ZV+3sObi98u2rI59KoqHH1B00='}) InvalidSchema: No connection adapters were found

|

Collectives™ on Stack Overflow

Scraping Multiple Data Tables at once in Python

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related