1

I'm trying to extract the Teams playing each day and the Active & Inactive players in each team's lineup. The URL for the page I'm trying to scrape is: https://stats.nba.com/lineups/. I've been using BeautifulSoup to try to get this data, and have tried a few methods to get to it, but I can't seem to extract anything within the

<div class=​"landing__flex-col lineups-game" data-game-state=​"3" nba-data-game=​"game" nba-with ng-include ng-repeat=​"game in games" src=​"'/​lineups-template.html'">​.

I want to get the teams in each matchup within each

<div class=​"landing__flex-col lineups-game" data-game-state=​"3" nba-data-game=​"game" nba-with ng-include ng-repeat=​"game in games" src=​"'/​lineups-template.html'">​,

and each player within the

<div class=​"columns small-6 lineups-game__team lineups-game__team--htm" nba-with nba-with-data-team=​"game.h" ng-include src=​"'/​lineups-team-template.html'">​.

So within the sample of html code below, I want to get the text for MEM, CHA, J. Valanciunas, and J. Crowder, and eventually do this for each player for each team.

<div class="landing__flex-row lineups-games" ng-show="isLoaded &amp;&amp; hasData" aria-hidden="false">
          <!----><!----><div class="landing__flex-col lineups-game" ng-repeat="game in games" nba-with="" nba-data-game="game" data-game-state="3" ng-include="" src="'/lineups-template.html'">
  <div class="lineups-game__inner row">

    <div class="columns small-12 lineups-game__title">
      <a href="/game/0021900154/">
        <span class="lineups-game__team-name">MEM</span>
        <span class="lineups-game__vs">vs</span>
        <span class="lineups-game__team-name">CHA</span>
        <span class="lineups-game__status hide-for-live-game">Final</span>
        <span class="lineups-game__status hide-for-pre-game hide-for-post-game">Live</span>
      </a>
    </div>

    <!----><div class="columns small-6 lineups-game__team lineups-game__team--vtm" nba-with="" nba-with-data-team="game.v" ng-include="" src="'/lineups-team-template.html'">

  <!----><!----><div ng-if="team.hasBench" nba-with="" nba-with-data-team="team" ng-include="" src="'/lineups-confirmed-roster-template.html'">
  <div class="lineups-game__header">
    <img team-logo="" class="lineups-game__team-logo team-img" abbr="MEM" type="image/svg+xml" src="/media/img/teams/logos/MEM_logo.svg" alt="Memphis Grizzlies logo" title="Memphis Grizzlies logo">
    <span class="lineups-game__team-name">MEM</span>
  </div>

  <div class="lineups-game__roster-type lineups-game__roster-type--confirmed">Active List</div>

  <ul class="lineups-game__roster lineups-game__roster--official">
    <!----><li class="lineups-game__player lineups-game__player--starter" ng-repeat="pl in team.starters">
      <a href="/player/202685/">
        <span class="lineups-game__pos">C</span>
        <span class="lineups-game__name">J. Valanciunas</span>
      </a>
    </li><!----><li class="lineups-game__player lineups-game__player--starter" ng-repeat="pl in team.starters">
      <a href="/player/203109/">
        <span class="lineups-game__pos">SF</span>
        <span class="lineups-game__name">J. Crowder</span>
      </a>

I tried by doing the following, among other methods, to no avail:

gamesSource = urllib.request.urlopen('https://stats.nba.com/lineups/').read()
gamesSoup = bs.BeautifulSoup(gamesSource,'html.parser')

teams = gamesSoup.find_all("span",{"class":"lineups-game__teams-name"})

All that ever gets returned is an empty list, and when I try to get a specific 'span' line, all that gets returned is 'None'.

Let me know what's going wrong, and what I can do to access the information I'm trying to get.

Thanks.

Sample of HTML Code

1
  • what do you want as an output? It's actually all right there with API call. Commented Nov 18, 2019 at 23:08

3 Answers 3

2

Piggy-backing off the already stated, since this page is generated via api/js calls, you will need to use a different scraping library. I usually go to Selenium. The code below will pull all the teams and rosters and put them together. There may be some quirks in this code but I think it will get down the road in the right direction:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from datetime import date

desired_link = 'https://stats.nba.com/lineups/'

fire_opts = webdriver.FirefoxOptions()
fire_opts.add_argument("-headless")
fire_path = 'geckodriver.exe'
driver = webdriver.Firefox(options=fire_opts,executable_path=fire_path)
driver.get(desired_link)

team_names_list = driver.find_elements_by_class_name('lineups-game__team-name')
team_names = []
for name in team_names_list:
    team_names.append(name.text)

starting_lineup_list = driver.find_elements_by_class_name('lineups-game__roster--projected')
starting_lineup = []
for lineup in starting_lineup_list:
    starting_lineup.append(lineup.text)

driver.quit()

for teams, players in zip(team_names,starting_lineup):
    print(teams,players)

This should output all the various teams on the page like so:

DET PG D. Rose
SG L. Kennard
SF T. Snell
PF B. Griffin
C A. Drummond

Could probably be formatted a bit better but you could throw it into a spreadsheet (or whatever you like) to use as you wish...

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I was able to modify what you said to get exactly what I wanted. Very helpful!
2

Unfortunately, you cannot do that with urllib. The website in question uses js to call apis to populate the data after the initial page load.

The urllib is only able to download the initial file that is served by the server but is unable to deal with any subsequent actions that the file might be executing after it's initial render in the browser.

Thus the teams = gamesSoup.find_all("span",{"class":"lineups-game__teams-name"}) call returns empty as the actual HTML you download through urllib.request (as seen here) does not yet have the lineups-game__teams-name elements populated yet.

You can try examining the api calls that the website is making after the initial load (check network tab) and see if you can find where the data that you want is coming from. If you are lucky, you might be able to get to that data through the api call. As the webpage will be making lots of external requests (for images and other media) you can tick XHR to only show you remote API calls in the network list.

If you cannot find the api or if it is blocked from external calls, you can alternatively try js enabled python browsers (i.e. selenium) to download the page that includes and executes the JS code.

Comments

1

You can get it by call to api. Just dynamically change the date parameter. Here's an example: You'll need to either iterate through the games/indexes or flatten out the json format and reconstruct into a dataframe:

import pandas as pd
import requests

url = 'https://stats.nba.com/js/data/dailylineups/2019/daily_lineups_20191118.json'
jsonData = requests.get(url).json()

print (pd.DataFrame(jsonData['results'][0]['LAC']))

Output:

  firstName  lastName playerId pos rotoId team
0   Patrick  Beverley   201976  PG   3072  LAC
1   Terance      Mann  1629611  SG   4860  LAC
2     Kawhi   Leonard   202695  SF   3195  LAC
3      Paul    George   202331  PF   3114  LAC
4     Ivica     Zubac   162726   C   3888  LAC

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.