4
\$\begingroup\$

I am trying to convert a 111MB TEXT file to PDF quickly. I'm currently using the FPDF library, and large files take about 40 minutes to process. The text file is an IBM carriage control (ANSI) file that contains these characters (link below). Not to mention this file in particular repeats these ANSI characters way more than I've seen on any other file, and I am also looking for specific text that repeats on every page break, and I am replacing it with a blank space to remove it. With that said, I'm looking to get a second opinion on possibly a more powerful library that will allow me to quickly convert these text files to PDF.

https://www.ibm.com/docs/en/zos/2.4.0?topic=hhua-understanding-how-ansi-machine-carriage-controls-are-used

I've optimized my code as much as possible to speed things up, but big files still take too long. I added ThreadPoolExecutor for parallel processing to handle multiple files at once, and I’m reading files in 20MB chunks instead of all at once to reduce memory usage. I also precompiled regex to make text replacements faster and streamlined how control characters are processed in a single pass. To further improve performance, I introduced batch processing, setting a batch size of 700 to manage workload distribution more efficiently. Smaller files process instantly, but large ones are still a bottleneck.

Text Sample:

1     Send Inquiries To:
-
                                                                               ACCOUNT NUMBER:                  123456789
0                                                                              YTD DIV RECEIVED:          789.01
0                                                                              PAGE NUMBER:                 1 of 1
                                                                                                                     1
0                                                                                  www.spoonmuseum.com
-
         HARRY POTTINGTON
         PROFESSIONAL JUGGLER
         456 PINEAPPLE RD
         UNICORN CITY, UC  98765
-
  Visit our "Spoon Museum" to see the world’s largest collection of spoons!
  Get 10% off when you mention the phrase "I love spoons!"
0                                               SUMMARY OF YOUR ACCOUNTS
0 ______________________________________________________________________________________________________________________
 |                                       |                                       |                                      |
 | SUFFIX 007 BANANA FUND               |                                       |                                      |
 | JOINT: HARRY POTTINGTON              |                                       |                                      |
 | STATEMENT PERIOD 01/15/25 - 01/15/25  |                                       |                                      |
 | BEGINNING BALANCE           3,000.00  |                                       |                                      |
 | DEPOSITS              100.00         |                                       |                                      |
 | WITHDRAWALS           50.00          |                                       |                                      |
 | BANANAS CLEARED        0.00           |                                       |                                      |
 | ENDING BALANCE              3,050.00  |                                       |                                      |
 |                                       |                                       |                                      |
 | PIE YEAR-TO-DATE               20.00   |                                       |                                      |
 | PIE THIS PERIOD             5.00      |                                       |                                      |
 |                                       |                                       |                                      |
  ______________________________________________________________________________________________________________________
0 SUFFIX      007 BANANA FUND
  ______________________________________________________________________________________________________________________
0 DEPOSITS
  --------
    DATE      DESCRIPTION       TRANSACTION AMOUNT   LOCATION
    ----      -----------       ------------------   --------
  01/15/25    DEPOSIT FROM GIGANTIC PIZZA PARTY   100.00   PIZZA WORLD
  ______________________________________________________________________________________________________________________
1     Send Inquiries To:
-
                                                                               ACCOUNT NUMBER:                 987654321
0                                                                              YTD DIV RECEIVED:          5,000.00
0                                                                              PAGE NUMBER:                 2 of 2
                                                                                                                     2
0                                                                                  www.cactuslovers.com
-
         WALTER GUMMY
         PROFESSIONAL ICE CREAM TASTER
         123 FROSTY LN
         ICECREAMVILLE, IV  54321
-
  Join the "Cactus Lovers Club" for exclusive cactus-themed merchandise and discounts.
  Visit our website to see the world's largest cactus collection!
0                                               SUMMARY OF YOUR ACCOUNTS
0 ______________________________________________________________________________________________________________________
 |                                       |                                       |                                      |
 | SUFFIX 002 MYSTERY COINS             |                                       |                                      |
 | JOINT: WALTER GUMMY                  |                                       |                                      |
 | STATEMENT PERIOD 02/20/25 - 02/20/25  |                                       |                                      |
 | BEGINNING BALANCE           2,500.00  |                                       |                                      |
 | DEPOSITS              200.00         |                                       |                                      |
 | WITHDRAWALS           100.00         |                                       |                                      |
 | ICE CREAM CLEARED      0.00           |                                       |                                      |
 | ENDING BALANCE              2,600.00  |                                       |                                      |
 |                                       |                                       |                                      |
 | CUPCAKE YEAR-TO-DATE            50.00  |                                       |                                      |
 | CUPCAKE THIS PERIOD            10.00  |                                       |                                      |
 | AVERAGE CHOCOLATE COIN BALANCE   1,000.00 |                                       |                                      |
 | DAYS ICE CREAM TASTED            5   |                                       |                                      |
 | ANNUAL ICE CREAM TASTER REWARD    10.00% |                                       |                                      |
 |                                       |                                       |                                      |
  ______________________________________________________________________________________________________________________
0 SUFFIX      002 MYSTERY COINS
  ______________________________________________________________________________________________________________________
0  HISTORY
   -------
    DATE      DESCRIPTION       TRANSACTION AMOUNT   ACCOUNT BALANCE
    ----      -----------       ------------------   ---------------
  02/20/25    DEPOSIT FROM MARSHMALLOW FACTORY  200.00   2,600.00
  A GUMMY REWARD OF       10.00 WILL BE POSTED TO YOUR ACCOUNT ON 02/20/25
  ______________________________________________________________________________________________________________________

Required Libraries:

pip install PyQt6 fpdf

Note:

When creating a profile you can set the font to 8.0 and cell height to 4.0.

Code:

import sys
import os
import re
import shutil
import sqlite3
from pathlib import Path
from datetime import datetime
from fpdf import FPDF
from PyQt6 import QtCore, QtWidgets
from PyQt6.QtGui import QIcon
from PyQt6.QtCore import QThread, pyqtSignal, Qt
from PyQt6.QtWidgets import (
    QApplication, QMainWindow, QWidget, QVBoxLayout, QHBoxLayout, 
    QPushButton, QLabel, QComboBox, QStatusBar, QMessageBox,
    QInputDialog, QListWidget, QProgressBar, QApplication, QMainWindow
)


# Define Directories
base_dir = Path(r'C:\path\to\base\dir')
input_dir = base_dir / '01-Input'
output_dir = base_dir / '02-Output'
processed_dir = base_dir / '03-Processed'
db_path = base_dir / 'profiles.db'

# Initialize the database
def init_db(db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute(""" 
        CREATE TABLE IF NOT EXISTS profiles (
            name TEXT PRIMARY KEY,
            font_size REAL,
            cell_height REAL
        )
    """)
    conn.commit()
    conn.close()

def fetch_profiles(db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT name, font_size, cell_height FROM profiles")
    profiles = {row[0]: {"font_size": row[1], "cell_height": row[2]} for row in cursor.fetchall()}
    conn.close()
    return profiles

def add_profile(db_path, name, font_size, cell_height):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute(""" 
        INSERT INTO profiles (name, font_size, cell_height) 
        VALUES (?, ?, ?) 
        ON CONFLICT(name) DO UPDATE SET 
        font_size = excluded.font_size, 
        cell_height = excluded.cell_height
    """, (name, font_size, cell_height))
    conn.commit()
    conn.close()

def delete_profile_from_db(db_path, name):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("DELETE FROM profiles WHERE name = ?", (name,))
    conn.commit()
    conn.close()

# IBM Data Processing Logic
# Precompile the regular expressions for better performance
form_feed_pattern = re.compile(r'\$DJDE FORMS=NONE,FEED=(MAIN|AUX),FORMAT=(PAGE1|PAGE2),END;')

def replacement(content):
    # Remove the specific forms and feeds using a single regex
    content = form_feed_pattern.sub('', content)
    
    # Remove the first character if it starts with '2' or '3'
    if content.startswith(('2', '3')):
        content = content[1:]
    
    # Remove the period if it ends with ' .'
    if content.endswith(' .'):
        content = content.rstrip(' .')
    
    return content

def process_ibm_data(file_path):
    formatted_lines = []
    page_started = False
    page_count = 0
    
    # Open the file in read mode
    with open(file_path, 'r', encoding='cp1252') as file:
        # Read file in chunks
        chunk_size = 20 * 1024 * 1024  # 20MB chunk size
        while chunk := file.read(chunk_size):
            # Process each chunk
            lines = chunk.splitlines()
            for line in lines:
                
                control_char = line[0]
                content = line[1:].rstrip()
                content = replacement(content)  # Apply all replacements in one pass

                # Process the line based on control character
                if control_char == '1':
                    if page_started:
                        formatted_lines.append("\f")  # Add page break for each page
                    page_started = True
                    page_count += 1
                    formatted_lines.append(content)
                elif control_char == '0':
                    formatted_lines.append("")  # Add an empty line
                    formatted_lines.append(content)
                elif control_char == '-':
                    formatted_lines.append("")  # Add two empty lines for spacing
                    formatted_lines.append("")
                    formatted_lines.append(content)
                elif control_char == '+':
                    if formatted_lines:
                        formatted_lines[-1] += content  # Append to the last line
                    else:
                        formatted_lines.append(content)
                elif control_char == ' ':
                    if not page_started:
                        page_started = True
                        page_count += 1
                    formatted_lines.append(content)

    return formatted_lines, page_count

# Create PDF logic
def create_pdf(output_path, lines, profile):
    pdf = FPDF(format='letter')
    pdf.set_auto_page_break(auto=True, margin=0.5)
    pdf.set_margins(left=5, top=0.5, right=0.5)
    pdf.add_page()
    pdf.set_font("Courier", size=profile["font_size"])
    
    for line in lines:
        if line == "\f":  # Add a new page when page break is encountered
            pdf.add_page()
        else:
            pdf.multi_cell(0, profile["cell_height"], line, align="L") #Aligns text to the left
    
    pdf.output(output_path)

# Check if a page is blank or only contains the number '1'
def is_blank_or_single_number_page(lines):
    return all(line.strip() == "" or line.strip() == "1" for line in lines)

from concurrent.futures import ThreadPoolExecutor, as_completed

class FileProcessorThread(QThread):
    processing_done = pyqtSignal()
    processing_error = pyqtSignal(str)
    processed_files = pyqtSignal(str, str, str, int)
    progress_updated = pyqtSignal(int)

    def __init__(self, input_files, output_dir, processed_subdir, profile, batch_size=500, parent=None):
        super().__init__(parent)
        self.input_files = input_files
        self.output_dir = output_dir
        self.processed_subdir = processed_subdir
        self.profile = profile
        self.batch_size = batch_size

    def process_file(self, file_path):
        current_date = datetime.now().strftime('%Y%m%d')
        blank_page_count = 0
        file_stem = file_path.stem
        output_file_name = f'{file_stem}_{current_date}_Cleansed.PDF'
        output_file_path = self.output_dir / output_file_name

        # Process the IBM data and get the formatted lines and page count
        formatted_lines, page_count = process_ibm_data(file_path)

        # Count the total number of pages
        total_pdf_pages = sum(1 for line in formatted_lines if line == "\f")

        non_blank_lines = []
        current_page_lines = []
        current_page_num = 0

        for line in formatted_lines:
            if line == "\f":
                if is_blank_or_single_number_page(current_page_lines):
                    blank_page_count += 1
                else:
                    non_blank_lines.extend(current_page_lines)
                    non_blank_lines.append("\f")
                    current_page_num += 1
                    progress = int((current_page_num / total_pdf_pages) * 100)
                    self.progress_updated.emit(progress)
                current_page_lines = []
            else:
                current_page_lines.append(line)

        if not is_blank_or_single_number_page(current_page_lines):
            non_blank_lines.extend(current_page_lines)

        if current_page_lines:
            current_page_num += 1
            progress = int((current_page_num / total_pdf_pages) * 100)
            self.progress_updated.emit(progress)

        create_pdf(output_file_path, non_blank_lines, self.profile)

        shutil.move(str(file_path), self.processed_subdir / file_path.name)
        self.processed_files.emit(file_path.name, str(page_count), output_file_name, blank_page_count)
        return blank_page_count

    def run(self):
        try:
            blank_page_count = 0
            total_files = len(self.input_files)

            # Using ThreadPoolExecutor for parallel processing of files
            with ThreadPoolExecutor() as executor:
                futures = [executor.submit(self.process_file, file) for file in self.input_files]
                
                for future in as_completed(futures):
                    blank_page_count += future.result()

            # print(f"Total blank pages removed: {blank_page_count}\n")
            self.processing_done.emit()

        except Exception as e:
            self.processing_error.emit(str(e))

class IBMFileProcessorApp(QMainWindow):
    def __init__(self):
        super().__init__()

        # Set up main window
        self.setWindowTitle("test")
        self.setWindowIcon(QIcon(r"python_scripts\ibm carriage control\assets\icons\letter-r.ico"))
        self.setGeometry(450, 250, 968, 394)

        # Initialize database
        init_db(db_path)

        # Set up central widget and layout
        self.central_widget = QWidget(self)
        self.setCentralWidget(self.central_widget)
        self.layout = QVBoxLayout(self.central_widget)

        # Set up the splitter for the two sections (left for file explorer, right for dropped files and progress)
        self.splitter = QtWidgets.QSplitter(self)
        self.splitter.setOrientation(QtCore.Qt.Orientation.Horizontal)
        self.layout.addWidget(self.splitter)

        # Create UI Elements
        self.create_left_side()    # Left side: drag-and-drop file explorer
        self.create_right_side()   # Right side: dropped files list and progress bar
        self.create_control_area()
        self.create_file_list_area()
        self.create_status_bar()

    def create_left_side(self):
        # Left side: Original file explorer logic
        self.left_widget = QWidget(self.splitter)
        self.left_layout = QVBoxLayout(self.left_widget)
        
        self.drop_area_label = QLabel("Drag and Drop Files Here", self)
        self.drop_area_label.setAlignment(Qt.AlignmentFlag.AlignCenter)
        self.drop_area_label.setStyleSheet("background-color: #1988ea; font: bold 12pt Arial; padding: 20px;")
        self.left_layout.addWidget(self.drop_area_label)

        self.drop_area_label.setAcceptDrops(True)
        self.drop_area_label.dragEnterEvent = self.drag_enter_event
        self.drop_area_label.dragMoveEvent = self.drag_move_event
        self.drop_area_label.dropEvent = self.drop_event

    def create_right_side(self):
        # Right side: Display dropped file name(s) and progress bar
        self.right_widget = QWidget(self.splitter)
        self.right_layout = QVBoxLayout(self.right_widget)

        # Label for the dropped files
        self.dropped_files_label = QLabel("Dropped Files", self)
        self.dropped_files_label.setAlignment(Qt.AlignmentFlag.AlignCenter)
        self.right_layout.addWidget(self.dropped_files_label)

        # List widget to display the dropped file names
        self.dropped_files_list = QListWidget(self)
        self.right_layout.addWidget(self.dropped_files_list)

        # Progress bar for showing the file processing progress
        self.progress_bar = QProgressBar(self)
        self.progress_bar.setRange(0, 100)
        self.right_layout.addWidget(self.progress_bar)

    def drag_enter_event(self, event):
        # Only accept text files for drag-and-drop
        if event.mimeData().hasUrls():
            urls = event.mimeData().urls()
            for url in urls:
                if url.toLocalFile().lower().endswith('.txt'):
                    self.drop_area_label.setStyleSheet("background-color: #44b8ff; font: bold 12pt Arial; padding: 20px; border: 3px solid #005c99;")
                    event.acceptProposedAction()
                    return
        event.ignore()  # Ignore if it's not a text file

    def drag_move_event(self, event):
        # This keeps the hover effect while moving the file over the drop area
        if event.mimeData().hasUrls():
            urls = event.mimeData().urls()
            for url in urls:
                if url.toLocalFile().lower().endswith('.txt'):
                    self.drop_area_label.setStyleSheet("background-color: #44b8ff; font: bold 12pt Arial; padding: 20px; border: 3px solid #005c99;")
                    event.accept()
                    return
        event.ignore()

    def drop_event(self, event):
        # Reset style and handle file drop
        self.drop_area_label.setStyleSheet("background-color: #1988ea; font: bold 12pt Arial; padding: 20px;")
        files = event.mimeData().urls()
        for url in files:
            file_path = Path(url.toLocalFile())
            shutil.move(file_path, input_dir / file_path.name)
        
        # Update the dropped files list in the right panel
        self.update_dropped_files_list()

    def update_dropped_files_list(self):
        # This will update the dropped files list on the right panel
        self.dropped_files_list.clear()
        for file in os.listdir(input_dir):
            self.dropped_files_list.addItem(file)

    def create_control_area(self):
        self.control_area = QWidget(self)
        self.control_layout = QHBoxLayout(self.control_area)
        self.layout.addWidget(self.control_area)

        # Profile Dropdown
        self.profile_dropdown = QComboBox(self)
        self.profile_dropdown.addItems(self.get_profile_names())
        self.control_layout.addWidget(self.profile_dropdown)

        # Buttons
        self.create_profile_button = QPushButton("New Profile", self)
        self.create_profile_button.clicked.connect(self.create_new_profile)
        self.control_layout.addWidget(self.create_profile_button)

        self.edit_profile_button = QPushButton("Edit Profile", self)
        self.edit_profile_button.clicked.connect(self.edit_profile)
        self.control_layout.addWidget(self.edit_profile_button)

        self.delete_profile_button = QPushButton("Delete Profile", self)
        self.delete_profile_button.clicked.connect(self.delete_profile)
        self.control_layout.addWidget(self.delete_profile_button)

        self.process_button = QPushButton("Process Files", self)
        self.process_button.clicked.connect(self.process_files)
        self.control_layout.addWidget(self.process_button)

    def create_file_list_area(self):
        self.file_list_label = QLabel("Processed Files", self)
        self.file_list_label.setAlignment(Qt.AlignmentFlag.AlignCenter)
        self.layout.addWidget(self.file_list_label)

        self.files_list = QListWidget(self)
        self.files_list.itemDoubleClicked.connect(self.open_file_or_directory)
        self.layout.addWidget(self.files_list)

        self.refresh_button = QPushButton("Refresh", self)
        self.refresh_button.clicked.connect(self.update_files_list)
        self.layout.addWidget(self.refresh_button)

    def create_status_bar(self):
        self.status_bar = QStatusBar(self)
        self.setStatusBar(self.status_bar)

    def create_progress_bar(self):
        self.progress_bar = QProgressBar(self)
        self.progress_bar.setRange(0, 100)
        self.layout.addWidget(self.progress_bar)

    def get_profile_names(self):
        profiles = fetch_profiles(db_path)
        return list(profiles.keys())

    def update_files_list(self):
        self.files_list.clear()
        for file in os.listdir(output_dir):
            self.files_list.addItem(file)

    def create_new_profile(self):
        name, ok = QInputDialog.getText(self, "New Profile", "Enter the profile name:")
        if ok and name:
            font_size, ok = QInputDialog.getDouble(self, "Font Size", "Enter font size:", min=1)
            if ok:
                cell_height, ok = QInputDialog.getDouble(self, "Cell Height", "Enter cell height:", min=1)
                if ok:
                    add_profile(db_path, name, font_size, cell_height)
                    self.profile_dropdown.addItem(name)
                    QMessageBox.information(self, "Success", f"Profile '{name}' created.")
                else:
                    QMessageBox.warning(self, "Invalid Input", "Please enter a valid cell height.")
            else:
                QMessageBox.warning(self, "Invalid Input", "Please enter a valid font size.")
        else:
            QMessageBox.warning(self, "Invalid Input", "Profile name cannot be empty.")

    def edit_profile(self):
        name = self.profile_dropdown.currentText()
        if name:
            profiles = fetch_profiles(db_path)
            font_size, ok = QInputDialog.getDouble(self, "Font Size", "Enter the new font size:", value=profiles[name]["font_size"], min=1)
            if ok:
                cell_height, ok = QInputDialog.getDouble(self, "Cell Height", "Enter the new cell height:", value=profiles[name]["cell_height"], min=1)
                if ok:
                    add_profile(db_path, name, font_size, cell_height)
                    self.profile_dropdown.setItemText(self.profile_dropdown.currentIndex(), name)
                    QMessageBox.information(self, "Success", f"Profile '{name}' updated.")
                else:
                    QMessageBox.warning(self, "Invalid Input", "Please enter a valid cell height.")
            else:
                QMessageBox.warning(self, "Invalid Input", "Please enter a valid font size.")
        else:
            QMessageBox.warning(self, "Select Profile", "Please select a profile to edit.")

    def delete_profile(self):
        name = self.profile_dropdown.currentText()
        if name:
            reply = QMessageBox.question(self, "Delete Profile", f"Are you sure you want to delete profile '{name}'?", 
                                        QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No)
            if reply == QMessageBox.StandardButton.Yes:
                delete_profile_from_db(db_path, name)
                self.profile_dropdown.removeItem(self.profile_dropdown.currentIndex())
                QMessageBox.information(self, "Success", f"Profile '{name}' deleted.")
                self.profile_dropdown.clear()
                self.profile_dropdown.addItems(self.get_profile_names())  # Refresh profile list
        else:
            QMessageBox.warning(self, "Select Profile", "Please select a profile to delete.")
        
    def process_files(self):
        # Get selected profile
        selected_profile_name = self.profile_dropdown.currentText()
        if not selected_profile_name:
            QMessageBox.warning(self, "No Profile Selected", "Please select a profile before processing files.")
            return
        
        profiles = fetch_profiles(db_path)
        profile = profiles[selected_profile_name]
        
        input_files = list(input_dir.glob("*.txt"))
        
        if not input_files:
            QMessageBox.warning(self, "No Files", "No files to process.")
            return
        
        # Disable the process button while processing
        self.process_button.setDisabled(True)
        
        # Start the file processing in a separate thread with a specified batch size (e.g., 5 files per batch)
        self.processor_thread = FileProcessorThread(input_files, output_dir, processed_dir, profile, batch_size=700)
        self.processor_thread.progress_updated.connect(self.update_progress_bar)
        self.processor_thread.processing_done.connect(self.processing_done)
        self.processor_thread.processing_error.connect(self.processing_error)
        self.processor_thread.processed_files.connect(self.file_processed)
        self.processor_thread.start()

    def update_progress_bar(self, progress):
        self.progress_bar.setValue(progress)

    def processing_done(self):
        QMessageBox.information(self, "Processing Complete", "All files have been processed.")
        self.process_button.setDisabled(False)
        self.progress_bar.setValue(0)

    def processing_error(self, error_message):
        QMessageBox.critical(self, "Error", f"An error occurred during processing: {error_message}")
        self.process_button.setDisabled(False)

    def file_processed(self, file_name, page_count, output_file_name, blank_page_count):
        # Add info to status bar about the processed file
        self.status_bar.showMessage(f"Processed: {file_name}, Pages: {page_count}, Output: {output_file_name}, Blank Pages Removed: {blank_page_count}")

    def open_file_or_directory(self, item):
        # Open the file or directory when clicked
        file_path = output_dir / item.text()
        if file_path.is_dir():
            os.startfile(file_path)
        else:
            os.startfile(file_path)

if __name__ == '__main__':
    app = QApplication(sys.argv)
    window = IBMFileProcessorApp()
    window.show()
    sys.exit(app.exec())
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

get a second opinion on possibly a more powerful library that will allow me to quickly convert these text files to PDF.

I don't have any advice, but perhaps others will. In the meantime, keep researching.

There may be a few other things you can try to isolate the speed problem. For example, it is not clear why you are using SQL to translate a plain text file to PDF. If SQL is not essential to the conversion, remove it temporarily to see if anything improves.

Also, it is not clear why you need the PyQt6 GUI for the conversion to PDF. Again, remove the GUI temporarily. It is not likely that will help with the speed up, but at least it isolates the problem.

Finally, use a profiling tool to see if anything is taking longer than expected.

The remaining suggestions are purely for code style.

Layout

Move the class to the top after the import lines. Move the other functions after the class. Having them in the middle of the code interrupts the natural flow of the code (from a human readability standpoint).

Also, move this import to the top with all the other import lines:

from concurrent.futures import ThreadPoolExecutor, as_completed

Documentation

The PEP 8 style guide recommends adding docstrings for classes and functions. The class docstring should summarize the purpose of the code.

For functions, you can convert comments like this:

# Initialize the database
def init_db(db_path):

into docstrings:

def init_db(db_path):
    """ Initialize the database """

You should add details regarding what kind of database you are using (what does it store) and what you are initializing it to.

Other function docstrings should describe input types and return types.

DRY

There are duplicate lines in the open_file_or_directory function. The startfile call is the same in both branches of the if/else:

    if file_path.is_dir():
        os.startfile(file_path)
    else:
        os.startfile(file_path)

Unless that is a bug, this code does the same thing without the repetition:

def open_file_or_directory(self, item):
    # Open the file or directory when clicked
    file_path = output_dir / item.text()
    os.startfile(file_path)

Tools

You could run code development tools to automatically find some style issues with your code.

ruff finds things like:

F811 [*] Redefinition of unused `QApplication` from line
|
|     QApplication, QMainWindow, QWidget, QVBoxLayout, QHBoxLayout, 
|     QPushButton, QLabel, QComboBox, QStatusBar, QMessageBox,
|     QInputDialog, QListWidget, QProgressBar, QApplication, QMainWindow
|                                              ^^^^^^^^^^^^ F811
| )
|
= help: Remove definition: `QApplication`

Also:

= help: Remove definition: `QMainWindow`
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.