Web scrape using Python - Execution takes too long

Question

I am trying to webscrape the "Active Positions" table from the following website:

https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings

My code is below:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.nasdaq.com/market-activity/stocks/aapl/institutional-holdings')
soup = BeautifulSoup(html_text, 'lxml')
job1 = soup.find('div', classs_ = 'dialog-off-canvas-main-canvas')
job2 = job1.find('div', class_ = 'page with-primary-nav hide-more-videos')
job3 = job2.find('div', class_ = 'page__main')
job4 = job3.find('div', class_ = 'page__content')
job5 = job4.find('div', class_ = 'quote-subdetail__content quote-subdetail__content--new')
job6 = job5.findAll('div', class_ = 'layout layout--2-col-large')
job7 = job6.find('div', class_ = 'institutional-holdings institutional-holdings--paginated')
job8 = job7.find('div', class_ = 'institutional-holdings__section institutional-holdings__section--active-positions')
job9 = job8.find('div', class_ = 'institutional-holdings__table-container')
job10 = job9.find('table', class_ = 'institutional-holdings__table')
job11 = job10.find('tbody', class_ = 'institutional-holdings__body')
job12 = job11.findAll('tr', class_ = 'institutional-holdings__row').text

print(job12)

I have chosen to include nearly every class path to attempt to speed up the execution, as including only a couple took up to 10 minutes before i decided to interupt. However, i still get the same long execution with no output. Is there something wrong with my code? Or can I improve this by doing something I haven't thought of? Thanks.

Barry the Platipus · Accepted Answer · 2022-11-13 14:39:30Z

1

Data is being hydrated in page via Javascript XHR calls. Here is a way of getting ActivePositions by scraping the API endpoint directly:

import requests
import pandas as pd

url = 'https://api.nasdaq.com/api/company/AAPL/institutional-holdings?limit=15&type=TOTAL&sortColumn=marketValue&sortOrder=DESC'

headers = {
    'accept': 'application/json, text/plain, */*',
    'origin': 'https://www.nasdaq.com',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data']['activePositions']['rows'])
print(df)

Result in terminal:

positions   holders shares
0   Increased Positions 1,780   239,170,203
1   Decreased Positions 2,339   209,017,331
2   Held Positions  283 8,965,339,255
3   Total Institutional Shares  4,402   9,413,526,789

In case you want to scrape the big 4,402 Institutional Holders table, there are ways for that too.

EDIT: Here is how you can save the data to a json file:

df.to_json('active_positions.json')

Although it might make more sense to save it as tabular data (csv):

df.to_csv('active_positions.csv')

Pandas docs: https://pandas.pydata.org/docs/

edited Nov 13, 2022 at 14:39

answered Nov 13, 2022 at 14:25

Barry the Platipus

10.5k2 gold badges9 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

kiestuthridge23 Over a year ago

Thank you! I noticed you've included JSON with the code, how would i be able to save the output data into a json file?

Barry the Platipus Over a year ago

Welcome @kiestuthridge23. I edited my answer, to show you how you can save the data to json, and also to csv.

kiestuthridge23 Over a year ago

That's great thanks. Also how would I be able to scrape the larger table below as you mentioned?

Barry the Platipus Over a year ago

There is a different API for that one - you can find it under Dev tools - Network tab. If you have difficulties, post a new question (as it is really a new question, based on my suggestion :) )

Barry the Platipus Over a year ago

I will give you a solution to your new question if you will ask it @kiestuthridge23

|

Collectives™ on Stack Overflow

Web scrape using Python - Execution takes too long

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related