How to create pandas dataframe from web scrape?

Question

I would like to use this web scrape to create a pandas dataframe that way I can export the data to excel. Is anyone familiar with this? I have seen different methods online and on this site but have been unable to successfully duplicate the results with this scrape.

Here is the code so far:

import requests

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()

for team in source['data']:
    print("\n%s players\n" % team['home_route'].capitalize())
    for player in team['home_players']:
        print(player['name'])
    print("\n%s players\n" % team['away_route'].capitalize())
    for player in team['away_players']:
        print(player['name'])

This site seems useful but the examples are different:

https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm

Here is another example from stackoverflow.com:

Loading web scraping results into Pandas DataFrame

I am new to coding/scraping so any help will greatly appreciated. Thanks in advance for your time and effort!

@ Able provides the answer below which will help i beleive , however i see the link about json data — Karn Kumar
– Karn Kumar, Commented Oct 29, 2018 at 18:27

Hari_pb · Accepted Answer · 2018-10-29 19:21:53Z

6

I have added a solution to have a dataframe teamwise, I hope this helps. Updated code

import requests 

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
players = []
teams = []
for team in source['data']:
    print("\n%s players\n" % team['home_route'].capitalize())
    teams.append(team['home_route'].capitalize())
    teams.append(team['away_route'].capitalize())
    temp = []
    temp1 = []
    for player in team['home_players']:
        print(player['name'])
        temp.append(player['name'])
    print("\n%s players\n" % team['away_route'].capitalize())
    for player in team['away_players']:
        print(player['name'])
        temp1.append(player['name'])

    players.append(temp)
    players.append(temp1)

import pandas as pd
df = pd.DataFrame(columns=teams)
for i in range(0, len(df.columns)):
    df[df.columns[i]] = players[i]

df

In order to export to excel, you can do

df.to_excel('result.xlsx')

edited Oct 29, 2018 at 19:21

answered Oct 29, 2018 at 18:27

Hari_pb

7,4564 gold badges49 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Able Archer Over a year ago

Thank you so much @Harry_pb! This was so informative =)

Able Archer Over a year ago

it looks like both teams are combined into one. The first 5 players (index 0-4) are listed on the correct teams... however, the last five players (index 5-9) are actually on different teams. Is there anyway to include the teams for the bottom 5 players too? For example, the top 5 players (Fultz to Embiid) are on the 76ers. The bottom 5 players (Young to Dedmon) are on the Hawks. I would love if we could include the teams for the bottom players as well. Thank you so much for your time!

Charles Landau · Accepted Answer · 2018-10-29 17:51:54Z

1

Python requests conveniently renders the json as a dict so you can just use the dict in a pd.DataFrame constructor.

import pandas as pd
df = pd.DataFrame([dict1, dict2, dict3])
# Do your data processing here
df.to_csv("myfile.csv")

Pandas also has pd.io.json with helpers like json_normalize so once your data is in a dataframe you can process nested json in to tabular data, and so on.

answered Oct 29, 2018 at 17:51

Charles Landau

4,2751 gold badge13 silver badges25 bronze badges

3 Comments

Able Archer Over a year ago

This seems very informative, however, I received an error 'dict1' is not defined.

Charles Landau Over a year ago

Those are just dummy names, representing dicts returned by your calls to requests.get(some_url).json()

Karn Kumar Over a year ago

@AbleArcher, Just use source as this is itself a dict which you will be able to create a dataframe out of it.. i have just illustrated that in.

Karn Kumar · Accepted Answer · 2018-10-29 18:23:10Z

you can try like below..

>>> import pandas as pd
>>> import json
>>> import requests

>>> source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
>>> df = pd.DataFrame.from_dict(source) # directly use source as itself is a dict

Now you can take the dataframe into csv format by df.to_csv as follows:

>>> df.to_csv("nba_play.csv")

Below are Just your columns which you can process for your data as desired..

>>> df.columns
Index(['bottom_header', 'bottom_paragraph', 'data', 'heading',
       'intro_paragraph', 'page_title', 'twitter_link'],
      dtype='object')

However as Charles said, you can use json_normalize which will give you better view of data in a tabular form..

>>> from pandas.io.json import json_normalize


>>> json_normalize(df['data']).head()
  away_bets.key  away_bets.moneyline away_bets.over_under  \
0           ATL                  500               o232.0
1           POR                  165               o217.0
2           SAC                  320               o225.0
3           BKN                  110               o216.0
4           TOR                 -140               o221.0

   away_bets.over_under_moneyline  away_bets.spread  \
0                            -115              11.0
1                            -115               4.5
2                            -105               9.0
3                            -105               2.0
4                            -105              -2.0

   away_bets.spread_moneyline  away_bets.total  \
0                        -110           121.50
1                        -105           110.75
2                        -115           117.00
3                        -110           109.00
4                        -115           109.50

                                       away_injuries  \
0  [{'name': 'J. Collins', 'profile_url': '/nba/p...
1  [{'name': 'M. Harkless', 'profile_url': '/nba/...
2  [{'name': 'K. Koufos', 'profile_url': '/nba/pl...
3  [{'name': 'T. Graham', 'profile_url': '/nba/pl...
4  [{'name': 'O. Anunoby', 'profile_url': '/nba/p...

                                        away_players              away_route  \
0  [{'draftkings_projection': 30.04, 'yahoo_posit...           atlanta-hawks
1  [{'draftkings_projection': 47.33, 'yahoo_posit...  portland-trail-blazers
2  [{'draftkings_projection': 28.88, 'yahoo_posit...        sacramento-kings
3  [{'draftkings_projection': 37.02, 'yahoo_posit...           brooklyn-nets
4  [{'draftkings_projection': 45.2, 'yahoo_positi...         toronto-raptors

   ...   nav.matchup_season           nav.matchup_time  \
0  ...                 2019  2018-10-29T23:00:00+00:00
1  ...                 2019  2018-10-29T23:00:00+00:00
2  ...                 2019  2018-10-29T23:30:00+00:00
3  ...                 2019  2018-10-29T23:30:00+00:00
4  ...                 2019  2018-10-30T00:00:00+00:00

   nav.status.away_team_score nav.status.home_team_score nav.status.minutes  \
0                        None                       None               None
1                        None                       None               None
2                        None                       None               None
3                        None                       None               None
4                        None                       None               None

  nav.status.quarter_integer  nav.status.seconds nav.status.status  \
0                                           None         Scheduled
1                                           None         Scheduled
2                                           None         Scheduled
3                                           None         Scheduled
4                                           None         Scheduled

                 nav.updated order
0  2018-10-29T17:51:05+00:00     0
1  2018-10-29T17:51:05+00:00     1
2  2018-10-29T17:51:05+00:00     2
3  2018-10-29T17:51:05+00:00     3
4  2018-10-29T17:51:05+00:00     4

[5 rows x 383 columns]

Hope, this will help

Collectives™ on Stack Overflow

How to create pandas dataframe from web scrape?

3 Answers 3

2 Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related