1

I am trying to get a specific number from this url: 'https://www.ulb.uni-muenster.de/' through webscraping. The number is dynamic. Unfortunately when I search for the number I only get the class, but not the number. When I inspect the url in my chrome browser I can see the number clearly in the source code. I have two approaches:

import seaborn as sns
from urllib.request import urlopen
from bs4 import BeautifulSoup

url = 'https://www.ulb.uni-muenster.de/'
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
tags = soup.find('span', {'class': 'seatingsCounter'})
print(tags)

Out: <span class="seatingsCounter"></span>

import requests
r = requests.get('https://www.ulb.uni-muenster.de/')
data = BeautifulSoup(r.content)
examples = []
for d in data.findAll('a'):
    examples.append(d)
my_as = soup.findAll("span", { "class" : "seatingsCounter" })

Out: [<span class="seatingsCounter"></span>]

Both of them are not working because the output is always just the class.

1
  • 1
    If it is dynamic bs4 will not detect it. You need to use selenium. Commented Nov 13, 2019 at 12:50

1 Answer 1

1

If you look in the page source code, you will see that the number of free places is updated by the JavaScript function showMessage:

var showMessage = function(data) {
                var locations = [ "ZB_LS", "ZB_RS" ];
                var free = 0;
                var total = 0;
                var open = true;
                $('.availableSeatings .spinner').remove();
                $('.availableSeatings .error').data('counter', 0);
                $.each(data.locations, function( key, value ) {
                    if ($.inArray( value.id, locations) !== -1)
                    {
                        free = free + Math.round((100 - value.quota) * value.places/100);
                        total = total + value.places;
                        open = open && value.open;
                    }
                });

                if (open)
                {
                    $('.availableSeatings .message').show().siblings().hide();
                    quota = Math.round(free/total * 100);
                    result = free + '<span class="quota">(' + quota + '%)</span>';
                    date = $.format.date(data.datetime, "dd.MM.yyyy, HH:mm");
                    $('.availableSeatings .seatingsCounter').html(result);  // <- HERE!!
                    $('.availableSeatings .updated .datetime').text(date);
                    $('.availableSeatings .updated').show();
                } else {
                    $('.availableSeatings .closed').show().siblings().hide();
                }
        };

A little further down the source code you will see this line:

$.ajax({
            dataType: "json",
            url: "/available-seatings.json",  \\ <-- THIS LOOKS INTERESTING
            timeout: 40000,
            success: function(data) { showMessage(data); },
            error: function() {
                counter = $('.availableSeatings .error').data('counter');
                if (isNaN(counter) || counter >= 3)
                {
                    showError();
                } else {
                    $('.availableSeatings .error').data('counter', counter + 1);
                }
            },
            complete: function() {
              setTimeout(worker, 60000);
            }
          });

And if we go to https://www.ulb.uni-muenster.de/available-seatings.json then we see something like:

{"datetime":"2019-11-13 13:49:46","locations":[{"id":"ZB_LS","label":"Zentralbibliothek Lesesaal","open":true,"quota":99,"places":678},{"id":"ZB_RS","label":"Zentralbibliothek Recherchesaal","open":true,"quota":94,"places":154},{"id":"VSTH","label":"Bibliothek im Vom-Stein-Haus","open":true,"quota":56,"places":145},{"id":"RWS1","label":"Bibliothek im Rechtswissenschaftlichen Seminar I \/ Einzelarbeitszone","open":true,"quota":98,"places":352},{"id":"RWS1_G","label":"Bibliothek im Rechtswissenschaftlichen Seminar I \/ Gruppenarbeitszone","open":true,"quota":30,"places":40},{"id":"RWS2","label":"Bibliothek im Rechtswissenschaftlichen Seminar II","open":true,"quota":54,"places":162},{"id":"WIWI","label":"Fachbereichsbibliothek Wirtschaftswissenschaften \/ Einzelarbeitszone","open":true,"quota":71,"places":132},{"id":"WIWI_G","label":"Fachbereichsbibliothek Wirtschaftswissenschaften \/ Gruppenarbeitszone","open":true,"quota":98,"places":45},{"id":"ZBSOZ","label":"Zweigbibliothek Sozialwissenschaften","open":true,"quota":74,"places":129},{"id":"FHAUS","label":"Gemeinschaftsbibliothek im F\u00fcrstenberghaus","open":true,"quota":68,"places":197},{"id":"IFE","label":"Bibliothek des Instituts f\u00fcr Erziehungswissenschaft","open":true,"quota":47,"places":183},{"id":"PHI","label":"Bibliotheken im Philosophikum (Domplatz 23)","open":true,"quota":68,"places":98}]}

Voila, adding a Python JSON module is probably easier than re-writing to use Selenium, though that would work too.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer! I'll try the JSON module. :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.