1

Having a bit of trouble and can't seem to figure this out...

I am trying to scrape the following URL below to get the body text, but seems like I am running into issues due to Javascript. Anyone have suggestions/thoughts on how to pull the text? Is this even possible? What is the best library to use?

https://www.solanalysis.com/

2
  • Can you share what you've tried so far? Commented Jan 24, 2022 at 2:01
  • You haven't given us any detail at all on how you're scraping. We can't make suggestions if we don't know what you're doing. Commented Jan 24, 2022 at 2:16

2 Answers 2

1

I would use the XHR request that fetched the data directly:

import requests
import pandas as pd

url = 'https://solanalysis-graphql-dot-feliz-finance.uc.r.appspot.com/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
payload = {"operationName":"GetProjectStatsQuery",
           "variables":{"pagination_info":{"page_number":1,"page_size":100},
             "conditions":{},
             "order_by":[{"field_name":"market_cap","sort_order":"DESC"}]},
           "query":"query GetProjectStatsQuery($conditions: GetProjectStatsCondition, $order_by: [OrderConfig!], $pagination_info: PaginationConfig) {\n  getProjectStats(\n    conditions: $conditions\n    order_by: $order_by\n    pagination_info: $pagination_info\n  ) {\n    project_stats {\n      project_id\n      market_cap\n      volume_7day\n      volume_1day_change\n      floor_price\n      average_price\n      average_price_1day_change\n      max_price\n      twitter_followers\n      num_of_token_listed\n      project {\n        supply\n        website\n        img_url\n        display_name\n        __typename\n      }\n      __typename\n    }\n    pagination_info {\n      total_page_number\n      current_page_number\n      has_next_page\n      current_page_size\n      __typename\n    }\n    __typename\n  }\n}\n"}

jsonData = requests.post(url, headers=headers, json=payload).json()


df = pd.DataFrame(jsonData['data']['getProjectStats']['project_stats'])

Output:

print(df.head(5).to_string())
             project_id  market_cap  volume_7day  volume_1day_change  floor_price  average_price  average_price_1day_change  max_price  twitter_followers  num_of_token_listed                                                                                                                                                                                                                                                     project   __typename
0  shadowysupercoderdao   117666155       788363             -0.0510        115.0     119.822968                    -0.0085     133.00              30267                   57                                {'supply': 10000, 'website': 'https://genesysgo.com/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/shadowysupercoderdao.png', 'display_name': 'Shadowy Super Coder DAO', '__typename': 'Project'}  ProjectStat
1                   smb    82297085      1119240             -0.0572        137.0     167.117647                     0.0259     555.00              76989                  654                                                  {'supply': 5000, 'website': 'https://market.solanamonkey.business', 'img_url': 'https://storage.googleapis.com/feliz-crypto/smb.jpg', 'display_name': 'Solana Monkey Business', '__typename': 'Project'}  ProjectStat
2              degenape    47488440       432058             -0.0175         36.0      48.349231                    -0.0283     130.00              95024                 1487  {'supply': 10000, 'website': 'https://www.degenape.academy/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/degenape/small/127RACV8SfCbbVrLdRbukh63zCDcubW4xVGh6aV6pnZi.jpg', 'display_name': 'Degenerate Ape Academy', '__typename': 'Project'}  ProjectStat
3        boryokudragonz    30880912       921541              0.1739        280.0     221.028112                     0.0342     269.69              18831                   12                                           {'supply': 1111, 'website': 'https://boryokudragonz.io/', 'img_url': 'https://storage.googleapis.com/feliz-crypto/project_photos/boryokudragonz.png', 'display_name': 'Boryoku Dragonz', '__typename': 'Project'}  ProjectStat
4                aurory    26850233       257762             -0.0401         22.0      27.342396                    -0.0001      28.90             178981                 1073                                                                           {'supply': 10000, 'website': 'https://app.aurory.io', 'img_url': 'https://storage.googleapis.com/feliz-crypto/aurorylogo.png', 'display_name': 'Aurory', '__typename': 'Project'}  ProjectStat
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much! I love you.
How to do the same for hisit.com to extract job information?
0

You have two options:

  1. You can use Selenium to scrape data, loaded by Javascript.
  2. You can explore XHR requests on the page and figure out how data is loaded on the page. Maybe you will have an option to send this XHR request by yourself using the simple requests library, for instance, and get desired data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.