Scraping data from the <script> tag using python

Question

I am looking for a way to scrape data from here to a list. The data I want to extract is in

rangeSelector -> series -> data

It is a collection of the price of a specific item at a certain time. I need to get rid of all the javascript code except for the data. I will then try to use this data for plotting and calculations.

I am new to web-scraping and I am looking for a simple one-time solution. What would be the best way to approach this problem?

document.addEventListener('DOMContentLoaded', function () {
    var myChart = Highcharts.stockChart('stocks-container', {
        rangeSelector: {
            selected: 1
        },
        yAxis: [{
            labels: {
                align: 'left'
            },
            height: '80%',
            resize: {
                enabled: true
            }
        }, {
            labels: {
                align: 'left'
            },
            top: '80%',
            height: '20%',
            offset: 0
        }],
        plotOptions: {
            column: {
                stacking: 'normal'
            }
        },
        series: [
            {
                name: 'Unit Price (Buy)',
                data: JSON.parse("[[1585902517017,187893.6],[1585906117013,193975.7],[1585909717026,189253.9],[1585913317001,195890.9],[1585916917027,197659.8],[1585920516999,201482.1],[1585924117021,198212.5],[1585927716997,208305.0],[1585929517008,207305.0],[1585933117021,193561.7],[1585936716979,199070.6],[1585938517019,195450.9],[1585942117009,195527.4],[1585945717007,195877.6],

Does this answer your question? How to extract a JSON object that was defined in a HTML page javascript block using Python? — CodeIt
– CodeIt, Commented Jul 20, 2020 at 16:43

Andrej Kesely · Accepted Answer · 2020-07-20 18:20:09Z

You can parse the data with re/json modules.

For example:

import re
import json
import requests


url = 'https://stonks.gg/products/search?input=Superior%20Fragment'
html_data = requests.get(url).text

d1 = json.loads(re.search(r'Unit Price \(Buy\).*?(\[\[.*?\]\])', html_data, flags=re.S).group(1))
d2 = json.loads(re.search(r'Unit Price \(Sell\).*?(\[\[.*?\]\])', html_data, flags=re.S).group(1))
d3 = json.loads(re.search(r'Instant Buy Volume.*?(\[\[.*?\]\])', html_data, flags=re.S).group(1))
d4 = json.loads(re.search(r'Instant Sell Volume.*?(\[\[.*?\]\])', html_data, flags=re.S).group(1))

print(d1)
print(d2)
print(d3)
print(d4)

Prints:

[[1585902517017, 187893.6], [1585906117013, 193975.7], [1585909717026, 189253.9], [1585913317001, 195890.9], [1585916917027, 197659.8], [1585920516999, 201482.1], [1585924117021, 198212.5], [1585927716997, 208305.0], [1585929517008, 207305.0], [1585933117021, 193561.7], [1585936716979, 199070.6], [1585938517019, 195450.9], [1585942117009, 195527.4], [1585945717007, 195877.6], [1585949317016, 198097.6], [1585952917006, 200590.3], [1585956517023, 198363.7], [1585958317074, 193681.3], [1585961917009, 199628.0], [1585967317017, 197546.9], [1585969117024, 195719.5], [1585972716979, 198053.2], [1585974516979, 195370.3], [1585976317029, 194257.0], [1585979917012, 195980.4], [1585981717045, 199915.4], [1585985316979, 199097.0], [1585987117024, 199425.4], [1585990717024, 198317.1], [1585994316979, 207382.3], [1585996117030, 199845.9], [1585999717009, 200711.5], 

...and so on.

Collectives™ on Stack Overflow

Scraping data from the <script> tag using python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related