2

I have a QWebEngine class tor read webpages and create BeautifulSoup for them.

Here is the code:

import sys
from bs4 import BeautifulSoup
import os


from PyQt5 import QtCore, QtWidgets, QtWebEngineWidgets


class WebPage(QtWebEngineWidgets.QWebEnginePage):
    def __init__(self):
        super(WebPage, self).__init__()
        self.loadFinished.connect(self.handleLoadFinished)
        self.soup = []

    def start(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        try:
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.load(QtCore.QUrl(url))
        return True

    def processCurrentPage(self, html):
        url = self.url().toString()
        self.soup.append(BeautifulSoup(html, 'lxml'))
        if not self.fetchNext():
            QtWidgets.qApp.quit()

    def handleLoadFinished(self):
        self.toHtml(self.processCurrentPage)

Here is another function to call WebPage class:

def get_soup(urls):
    app = QtWidgets.QApplication(sys.argv)
    webpage = WebPage()
    webpage.start(urls)
    return webpage.soup

Here is the main:

if __name__ == "__main__":

    urls = ["http://www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh", "http://www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sz"]      
    soups = get_soup(urls)

However, the program restarts when I executed the program.

What should be changed?

0

1 Answer 1

2

This is a problem that I had already had and analyzing I found that the QApplication is destroyed before QWebEnginePage making the QWebEngineProfile is deleted, and in this case causing QWebEnginePage crashes. The solution is to make the app have a greater scope by making it a global variable.

On the other hand you have to call exec_() so that the eventloop that allows the operation of the signals

# ...
app = None

def get_soup(urls):
    global app
    app = QtWidgets.QApplication(sys.argv)
    webpage = WebPage()
    webpage.start(urls)
    app.exec_()
    return webpage.soup
# ...

Note: It seems that the QTBUG-75547 related to this problem has been solved for Qt5>=5.12.4 so probably in a next release of PyQtWebEngine that bug will no longer be observed.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.