2

I would like to use this web scrape to create a pandas dataframe that way I can export the data to excel. Is anyone familiar with this? I have seen different methods online and on this site but have been unable to successfully duplicate the results with this scrape.

Here is the code so far:

import requests

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()

for team in source['data']:
    print("\n%s players\n" % team['home_route'].capitalize())
    for player in team['home_players']:
        print(player['name'])
    print("\n%s players\n" % team['away_route'].capitalize())
    for player in team['away_players']:
        print(player['name'])

This site seems useful but the examples are different:

https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

Here is another example from stackoverflow.com:

Loading web scraping results into Pandas DataFrame

I am new to coding/scraping so any help will greatly appreciated. Thanks in advance for your time and effort!

1
  • @ Able provides the answer below which will help i beleive , however i see the link about json data Commented Oct 29, 2018 at 18:27

3 Answers 3

6

I have added a solution to have a dataframe teamwise, I hope this helps. Updated code

import requests 

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
players = []
teams = []
for team in source['data']:
    print("\n%s players\n" % team['home_route'].capitalize())
    teams.append(team['home_route'].capitalize())
    teams.append(team['away_route'].capitalize())
    temp = []
    temp1 = []
    for player in team['home_players']:
        print(player['name'])
        temp.append(player['name'])
    print("\n%s players\n" % team['away_route'].capitalize())
    for player in team['away_players']:
        print(player['name'])
        temp1.append(player['name'])

    players.append(temp)
    players.append(temp1)

import pandas as pd
df = pd.DataFrame(columns=teams)
for i in range(0, len(df.columns)):
    df[df.columns[i]] = players[i]

df

enter image description here

In order to export to excel, you can do

df.to_excel('result.xlsx')
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much @Harry_pb! This was so informative =)
it looks like both teams are combined into one. The first 5 players (index 0-4) are listed on the correct teams... however, the last five players (index 5-9) are actually on different teams. Is there anyway to include the teams for the bottom 5 players too? For example, the top 5 players (Fultz to Embiid) are on the 76ers. The bottom 5 players (Young to Dedmon) are on the Hawks. I would love if we could include the teams for the bottom players as well. Thank you so much for your time!
1

Python requests conveniently renders the json as a dict so you can just use the dict in a pd.DataFrame constructor.

import pandas as pd
df = pd.DataFrame([dict1, dict2, dict3])
# Do your data processing here
df.to_csv("myfile.csv")

Pandas also has pd.io.json with helpers like json_normalize so once your data is in a dataframe you can process nested json in to tabular data, and so on.

3 Comments

This seems very informative, however, I received an error 'dict1' is not defined.
Those are just dummy names, representing dicts returned by your calls to requests.get(some_url).json()
@AbleArcher, Just use source as this is itself a dict which you will be able to create a dataframe out of it.. i have just illustrated that in.
1

you can try like below..

>>> import pandas as pd
>>> import json
>>> import requests

>>> source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
>>> df = pd.DataFrame.from_dict(source) # directly use source as itself is a dict

Now you can take the dataframe into csv format by df.to_csv as follows:

>>> df.to_csv("nba_play.csv")

Below are Just your columns which you can process for your data as desired..

>>> df.columns
Index(['bottom_header', 'bottom_paragraph', 'data', 'heading',
       'intro_paragraph', 'page_title', 'twitter_link'],
      dtype='object')

However as Charles said, you can use json_normalize which will give you better view of data in a tabular form..

>>> from pandas.io.json import json_normalize


>>> json_normalize(df['data']).head()
  away_bets.key  away_bets.moneyline away_bets.over_under  \
0           ATL                  500               o232.0
1           POR                  165               o217.0
2           SAC                  320               o225.0
3           BKN                  110               o216.0
4           TOR                 -140               o221.0

   away_bets.over_under_moneyline  away_bets.spread  \
0                            -115              11.0
1                            -115               4.5
2                            -105               9.0
3                            -105               2.0
4                            -105              -2.0

   away_bets.spread_moneyline  away_bets.total  \
0                        -110           121.50
1                        -105           110.75
2                        -115           117.00
3                        -110           109.00
4                        -115           109.50

                                       away_injuries  \
0  [{'name': 'J. Collins', 'profile_url': '/nba/p...
1  [{'name': 'M. Harkless', 'profile_url': '/nba/...
2  [{'name': 'K. Koufos', 'profile_url': '/nba/pl...
3  [{'name': 'T. Graham', 'profile_url': '/nba/pl...
4  [{'name': 'O. Anunoby', 'profile_url': '/nba/p...

                                        away_players              away_route  \
0  [{'draftkings_projection': 30.04, 'yahoo_posit...           atlanta-hawks
1  [{'draftkings_projection': 47.33, 'yahoo_posit...  portland-trail-blazers
2  [{'draftkings_projection': 28.88, 'yahoo_posit...        sacramento-kings
3  [{'draftkings_projection': 37.02, 'yahoo_posit...           brooklyn-nets
4  [{'draftkings_projection': 45.2, 'yahoo_positi...         toronto-raptors

   ...   nav.matchup_season           nav.matchup_time  \
0  ...                 2019  2018-10-29T23:00:00+00:00
1  ...                 2019  2018-10-29T23:00:00+00:00
2  ...                 2019  2018-10-29T23:30:00+00:00
3  ...                 2019  2018-10-29T23:30:00+00:00
4  ...                 2019  2018-10-30T00:00:00+00:00

   nav.status.away_team_score nav.status.home_team_score nav.status.minutes  \
0                        None                       None               None
1                        None                       None               None
2                        None                       None               None
3                        None                       None               None
4                        None                       None               None

  nav.status.quarter_integer  nav.status.seconds nav.status.status  \
0                                           None         Scheduled
1                                           None         Scheduled
2                                           None         Scheduled
3                                           None         Scheduled
4                                           None         Scheduled

                 nav.updated order
0  2018-10-29T17:51:05+00:00     0
1  2018-10-29T17:51:05+00:00     1
2  2018-10-29T17:51:05+00:00     2
3  2018-10-29T17:51:05+00:00     3
4  2018-10-29T17:51:05+00:00     4

[5 rows x 383 columns]

Hope, this will help

1 Comment

Thank you so much @pygo... I am grateful for your help =)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.