1

I’m trying to build a scraper in Python that gets a variable from JavaScript code within the HTML of a webpage. This variable changes over time. Here is the JavaScript code; I need the first number of the yValues variable:

jQuery(document).ready(function() {
  var draw = true;
  
  if ("Biblioteca di Ingegneria" == "") {
    draw = false;
  }
  
  if (draw) {
    var yValues = [
        "28",
        "100"
      ];
    var Titolo = "Biblioteca di Ingegneria";
    var sottoTitolo = "Posti Totali: 128";
    var barColors = [
        "#167d21",
        "#ed2135"
      ];
    var xValues = [
        "Liberi (28)",
        "Occupati (100)"
      ];
    
    new Chart("InOutChart", {
      type: "pie",
      data: {
        labels: xValues,
        datasets: [
          {
            backgroundColor: barColors,
            data: yValues
          }
        ]
      },
      options: {
        plugins: {
          title: {
            display: true,
            text: Titolo,
            font: {
              size: 25,
              style: 'normal',
              lineHeight: 1.2
            },
            // padding: {
            //   top: 10,
            //   bottom: 30
            // }
          },
          subtitle: {
            display: true,
            text: sottoTitolo,
            font: {
              size: 20,
              style: 'normal',
              lineHeight: 1.2
            },
            padding: {
              bottom: 30
            }
          },
          legend: {
            display: true,
            position: "bottom",
            labels: {
              font: {
                size: 20,
                style: 'normal',
                lineHeight: 1.2
              }
            }
          }
        },
        responsive: true,
        maintainAspectRatio: false,
        scales: {
          yAxes: [
            {
              display: true,
              ticks: {
                beginAtZero: true
              }
            }
          ]
        }
      }
    });
  }
});

This is the best I could do:

from bs4 import BeautifulSoup
import requests

# Make a GET request to the URL of the web page.
base_url = 'https://qrbiblio.unipi.it/Home/Chart?IdCat=a96d84ba-46e8-47a1-b947-ab98a8746d6f'
response = requests.get(base_url)

# Parse the HTML content of the page.
soup = BeautifulSoup(response.text, "html.parser")

# Find all the `<script>` elements on the page.
scripts = soup.find_all("script")

# Get the 8th `<script>` element.
script8 = scripts[7]

# Transform the 8th `<script>` into a string.
script8_txt = "".join(script8)

# Get the useful string from the 8th `<script>`.
usefull_txt = script8_txt[248:251]
        
# Get the int from the string.
pl = int("".join(filter(str.isdigit, usefull_txt)))

print(pl)

This works, but I want to automatically parse the JavaScript code to find the variable and get its value, because as you can see I manually checked the position of the characters that I needed. I’m looking for a better solution because I’m planning to use this code for other similar webpages, but the position of the variable changes every time. Last information: I want to put this Python code in an Alexa skill, so I don’t know if Selenium package will work well.

1
  • 1
    Use Esprima. Commented Dec 15, 2022 at 15:10

1 Answer 1

1

Try this:

import ast

import requests
from bs4 import BeautifulSoup

base_url = 'https://qrbiblio.unipi.it/Home/Chart?IdCat=a96d84ba-46e8-47a1-b947-ab98a8746d6f'
response = requests.get(base_url)

script = (
    BeautifulSoup(response.text, "html.parser")
    .find_all("script")[7]
    .string
)
numbers = ast.literal_eval(
    script.strip().split("var yValues = ")[1].split(";")[0]
)
print(numbers)
print(numbers[0])

Output:

['130', '0']
130
Sign up to request clarification or add additional context in comments.

2 Comments

Amazing, i need only the first number so I imported re and put the string in a variable named "value" to get: first_number = int(re.findall(r'\d+', value)[0]) Is there a better way to do it?
@AlbertoAtzori you can use literal_eval to get a list of string values and then get the first one. I've updated the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.