Webscrape HTML in Python Using BeautifulSoup

Question

I'm attempting to use a website to scrape a specific html table that contains the following information:

Balance
Addresses
% Addresses (Total)
Coins
USD
% Coins (Total)

The code that I am using is below:

url = "https://app.intotheblock.com/coin/AMP/deep-dive?group=ownership&chart=all"

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html)
table = soup.find('table', {"class": "sc-lhVmIH fYUufF sc-cmTdod gUHuZc"})

rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

result = pd.DataFrame(data, columns=['Balance','Addresses','% Addresses (Total)','Coins','USD','% Coins (Total)'])

print(result)

I attempted to inspect the webpage so that I could grab the class type of the table but when I thought I found the html table I was looking for I keep getting an error on the following line "rows = table.find_all('tr')". This is telling me that I am not selecting the right class for the table that I would like to scrape.

I wrote code that would automatically login to the website, enter credentials, click the login button and navigate to the specific page that I would like to scrape and the table is returning back empty. The class type that I choose came after the table data so I thought that it was the correct class to use.

The specific link I am trying to scrape the data is below:

Link: https://app.intotheblock.com/coin/AMP/deep-dive?group=ownership&chart=all

The website makes you sign up to be able to see the data, I will post a picture below of the table/code from the website just to show the class I chose for the table. I would greatly appreciate if anyone could provide me some assistance as I am stuck on what I am doing wrong here.

dcsuka · Accepted Answer · 2022-07-13 19:18:34Z

1

You need JS to view that, so it is easier to scrape via the underlying API.

Google Chrome Inspect Network XHR then search api, find the one you need, then structure a python request to receive the json using the authorization token as such.

import requests

url = "https://services.intotheblock.com/api/internal/metrics/coin/8bdae7d9-b8ff-41a1-8229-2dd07f047845/ownership/holdings_distribution_matrix"

payload={}
headers = {
  'Authorization': 'Bearer some long text'
}

request = requests.request("GET", url, headers=headers, data=payload)
request.json()

edited Jul 13, 2022 at 19:18

answered Jul 13, 2022 at 4:59

dcsuka

3,0073 gold badges8 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

DBT Over a year ago

Is there anyway to read in the html as text? I realize the easier way would be to use an API but the companies API can be quite expensive just for a piece of data that I would like to analyze first. Is there no way to read in the html page as text because I need JS?

DBT Over a year ago

Something like a solution like this? Link to Scraping Dynamic Webpage

dcsuka Over a year ago

Just scrape instead with the heavier solution like Selenium that you use to automate the login process. Let me know if this helps: stackoverflow.com/questions/60899709/…

DBT Over a year ago

Yes! This helped me so much finally got it using xpath, thank you very much!

Collectives™ on Stack Overflow

Webscrape HTML in Python Using BeautifulSoup

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related