0

I'm trying to connect to many Pervasive databases using pyodbc and then perform a sql query on each database. The current problem Im facing is that this script takes too long to run because its trying to connect and run a sql query 1 at a time. I thought multi threading might be a good solution for this. Im very new to multi threading and was wondering what the best approach for something like this would be?

Any tips would be GREATLY appreciated, thanks for looking.

My connection looks something like this:

import pyodbc
import pandas as pd
import logging

server = '1.1.1.1:111'
database = 'ABC'
username = 'test'
password = 'test123'
list_of_databases = list(large_list)
sql  = "SELECT * FROM Table where Date between '20230226' and '20230227'

my loop looks like this where I get all of the connections and establish a connection w/ all databases.

def connect_to_pervasive(databases, server):

    try:
        
        logger.info("connecting to Pervasive Server")
        connect_string = 'DRIVER=Pervasive ODBC Interface;SERVERNAME={server_address};DBQ={db}'
        connections = [pyodbc.connect(connect_string.format(db=n, server_address=server)) for n in databases]
        cursors = [conn.cursor() for conn in connections]
        logger.info("Connection established!")
        
    except Exception as e:
        logger.critical(f"Error: {e}")

    return cursors

Here is where I think I should do my multi threading. Right now this is opening MANY connections, not closing connections after they are run and going 1 at a time. Ideally id want multiple connections happening. Any tips would be GREATLY appreciated, thanks for looking.

data = []
try:
    for cur in connections:
        rows = cur.execute(sql).fetchall()
        df = pd.DataFrame.from_records(rows, columns=[col[0] for col in cur.description])
        data.append(df)

except Exception as e:
    print(e)
    logger.critical(f'Error: {e}')

finally:
    for cur in connections:
        cur.close()
2
  • 1
    Take a look at ThreadPoolExecutor in the concurrent.futures module. You should check documentation for Pervasive ODBC connections to see if they're thread-safe (they may not be). Consider establishing and closing a connection per thread Commented Feb 28, 2023 at 6:31
  • Thanks for the tips Pingu. Can you help me get started with what the script would look like? Commented Feb 28, 2023 at 20:01

1 Answer 1

1

I am not too clear on exactly what you want to do since the code you have doesn't quite match up with your comments. But if you are just doing one database SELECT per connection, then I would think that something like the following will do what you want. If you want something else, I am sure will get back to me ...

from conucrrent.futures import ThreadPoolExecutor

...
# Rest of code as before except new definition of
# connect_to_pervasive:

# Presumably these values are for every database:
server = '1.1.1.1:111'
sql  = "SELECT * FROM Table where Date between '20230226' and '20230227'"

def connect_to_pervasive(database):
    """ Connect to a single database. """
    try:      
        logger.info("connecting to Pervasive Server")
        connect_string = 'DRIVER=Pervasive ODBC Interface;SERVERNAME={server_address};DBQ={db}'
        connection = pyodbc.connect(connect_string.format(db=database, server_address=server))
        logger.info("Connection established!")
        return connection
        
    except Exception as e:
        logger.critical(f"Error: {e}")
        raise # re-raise the exception

def worker(database):
    """ This creates connection, cursor, retrieves rows, closes connection, etc. """
    connection = None
    cursor = None
    try:
        # Same server for every database?
        # If so and server is defined as '1.1.1.1:111', then there is no real
        # need to pass server as an argument:
        connection = connect_to_pervasiveness(database)
        cursor = connection.cursor()
        rows = cursor.execute(sql).fetchall()
        return pd.DataFrame.from_records(rows, columns=[col[0] for col in cursor.description])
    except Exception as e:
        print(e)
        logger.critical(f'Error: {e}')
        return None
    finally:
        if connection:
            if cursor:
                cursor.close()
            connection.close()

with ThreadPoolExecutor(len(databases)) as executor:
    # If the worker gets an exception, then the corresponding list element
    # will be None.
    data = list(executor.map(worker, databases))
Sign up to request clarification or add additional context in comments.

5 Comments

this is awesome and very helpful! Thank you. Its working but I am getting performance issues. Im wondering if its taking a long time to load the results into a dataframe quickly? Also it appears that the connection is not stopping even though I finished processing.
For one thing, you have a discrepancy in the code you posted. You have two variables, connections holding connections and cursors holding cursors. Yet you had for cur in connections: instead of for cur in cursors:, which I blindly copied. Let me update the code for it to make more sense (at least to me). If you are explicitly closing the connections at the end, then I don't understand how they wouldn't be closed unless these connections are not thread safe and that is causing the issue. I will update the answer to try something a bit different.
hey thanks again this is very helpful! Ah the reason for multiple "connections" is because I have a list of databases that I want to connect w not just 1 at a time. I think to get your script working I would need to include a for loop before your connection variable. so like for conn in databases?
@aero8991 It appears from what you originally posted that you want to do the same SELECT to create a dataframe for each of some number of databases. Let's call that number N. This code submits N "tasks" to the thread pool where each task concurrently connects to its specific database once to create its specific dataframe. What loop would you need? All N connections and dataframes will be created concurrently, i.e. the worker function id called N times each with a different element of databases.
with the help of your code I got it working. Thank you so much Booboo!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.