How To Webscrape A Page With Javascript Using Python?

Question

Having a bit of trouble and can't seem to figure this out...

I am trying to scrape the following URL below to get the body text, but seems like I am running into issues due to Javascript. Anyone have suggestions/thoughts on how to pull the text? Is this even possible? What is the best library to use?

https://www.solanalysis.com/

You haven't given us any detail at all on how you're scraping. We can't make suggestions if we don't know what you're doing. — John Gordon
– John Gordon, Commented Jan 24, 2022 at 2:16

chitown88 · Accepted Answer · 2022-01-24 15:52:02Z

I would use the XHR request that fetched the data directly:

import requests
import pandas as pd

url = 'https://solanalysis-graphql-dot-feliz-finance.uc.r.appspot.com/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
payload = {"operationName":"GetProjectStatsQuery",
           "variables":{"pagination_info":{"page_number":1,"page_size":100},
             "conditions":{},
             "order_by":[{"field_name":"market_cap","sort_order":"DESC"}]},
           "query":"query GetProjectStatsQuery($conditions: GetProjectStatsCondition, $order_by: [OrderConfig!], $pagination_info: PaginationConfig) {\n  getProjectStats(\n    conditions: $conditions\n    order_by: $order_by\n    pagination_info: $pagination_info\n  ) {\n    project_stats {\n      project_id\n      market_cap\n      volume_7day\n      volume_1day_change\n      floor_price\n      average_price\n      average_price_1day_change\n      max_price\n      twitter_followers\n      num_of_token_listed\n      project {\n        supply\n        website\n        img_url\n        display_name\n        __typename\n      }\n      __typename\n    }\n    pagination_info {\n      total_page_number\n      current_page_number\n      has_next_page\n      current_page_size\n      __typename\n    }\n    __typename\n  }\n}\n"}

jsonData = requests.post(url, headers=headers, json=payload).json()


df = pd.DataFrame(jsonData['data']['getProjectStats']['project_stats'])

Output:

print(df.head(5).to_string())
             project_id  market_cap  volume_7day  volume_1day_change  floor_price  average_price  average_price_1day_change  max_price  twitter_followers  num_of_token_listed                                                                                                                                                                                                                                                     project   __typename
0  shadowysupercoderdao   117666155       788363             -0.0510        115.0     119.822968                    -0.0085     133.00              30267                   57                                {'supply': 10000, 'website': 'https://genesysgo.com/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/shadowysupercoderdao.png', 'display_name': 'Shadowy Super Coder DAO', '__typename': 'Project'}  ProjectStat
1                   smb    82297085      1119240             -0.0572        137.0     167.117647                     0.0259     555.00              76989                  654                                                  {'supply': 5000, 'website': 'https://market.solanamonkey.business', 'img_url': 'https://storage.googleapis.com/feliz-crypto/smb.jpg', 'display_name': 'Solana Monkey Business', '__typename': 'Project'}  ProjectStat
2              degenape    47488440       432058             -0.0175         36.0      48.349231                    -0.0283     130.00              95024                 1487  {'supply': 10000, 'website': 'https://www.degenape.academy/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/degenape/small/127RACV8SfCbbVrLdRbukh63zCDcubW4xVGh6aV6pnZi.jpg', 'display_name': 'Degenerate Ape Academy', '__typename': 'Project'}  ProjectStat
3        boryokudragonz    30880912       921541              0.1739        280.0     221.028112                     0.0342     269.69              18831                   12                                           {'supply': 1111, 'website': 'https://boryokudragonz.io/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/boryokudragonz.png', 'display_name': 'Boryoku Dragonz', '__typename': 'Project'}  ProjectStat
4                aurory    26850233       257762             -0.0401         22.0      27.342396                    -0.0001      28.90             178981                 1073                                                                           {'supply': 10000, 'website': 'https://app.aurory.io', 'img_url': 'https://storage.googleapis.com/feliz-crypto/aurorylogo.png', 'display_name': 'Aurory', '__typename': 'Project'}  ProjectStat

How to do the same for hisit.com to extract job information?

Oleksii Tambovtsev · Accepted Answer · 2022-01-24 02:14:35Z

0

You have two options:

You can use Selenium to scrape data, loaded by Javascript.
You can explore XHR requests on the page and figure out how data is loaded on the page. Maybe you will have an option to send this XHR request by yourself using the simple requests library, for instance, and get desired data.

answered Jan 24, 2022 at 2:14

Oleksii Tambovtsev

2,9041 gold badge6 silver badges24 bronze badges

Collectives™ on Stack Overflow

How To Webscrape A Page With Javascript Using Python?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related