I'm trying to connect to many Pervasive databases using pyodbc and then perform a sql query on each database. The current problem Im facing is that this script takes too long to run because its trying to connect and run a sql query 1 at a time. I thought multi threading might be a good solution for this. Im very new to multi threading and was wondering what the best approach for something like this would be?
Any tips would be GREATLY appreciated, thanks for looking.
My connection looks something like this:
import pyodbc
import pandas as pd
import logging
server = '1.1.1.1:111'
database = 'ABC'
username = 'test'
password = 'test123'
list_of_databases = list(large_list)
sql = "SELECT * FROM Table where Date between '20230226' and '20230227'
my loop looks like this where I get all of the connections and establish a connection w/ all databases.
def connect_to_pervasive(databases, server):
try:
logger.info("connecting to Pervasive Server")
connect_string = 'DRIVER=Pervasive ODBC Interface;SERVERNAME={server_address};DBQ={db}'
connections = [pyodbc.connect(connect_string.format(db=n, server_address=server)) for n in databases]
cursors = [conn.cursor() for conn in connections]
logger.info("Connection established!")
except Exception as e:
logger.critical(f"Error: {e}")
return cursors
Here is where I think I should do my multi threading. Right now this is opening MANY connections, not closing connections after they are run and going 1 at a time. Ideally id want multiple connections happening. Any tips would be GREATLY appreciated, thanks for looking.
data = []
try:
for cur in connections:
rows = cur.execute(sql).fetchall()
df = pd.DataFrame.from_records(rows, columns=[col[0] for col in cur.description])
data.append(df)
except Exception as e:
print(e)
logger.critical(f'Error: {e}')
finally:
for cur in connections:
cur.close()