I have the following code in Python, which will execute 5 queries in a row. The average runtime per query is around 181.1 seconds (~3 minutes), and the total runtime for all 5 queries is 905.4 seconds (~15 minutes). Eventually, after loading the data into DataFrames, I am going to perform ETL work (mostly looking for errors, data quality issues, and inconsistencies), but before that I want to try to leverage multiprocessing to shave off on runtime. I am not familiar with multiprocessing in Python, so I have been reading about different methodologies (Queue vs Pooling, etc.). I am curious which method would run best for this workflow, and how would I implement this? Ideally, a multiprocessed-translated version of this code, or a guide to get there would be great.
Thank you.
EDIT: in case I wasn’t clear, I want to run all 5 queries concurrently. Something that may be an issue is appending each DataFrame to the list concurrently, so if I need to I am willing to forgo that.
import pandas as pd
import psycopg2
import time
import os
host = os.environ["DBHOST"]
user = os.environ["DBUSER"]
pass = os.environ["DBPWD"]
db_conn = psycopg2.connect("host='{}' port={} dbname='{}' user={} password={}".format(host,
port#,
"db_name",
user,
pass))
query_load = [("SELECT column_name_1, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_1 "
"ORDER BY column_name_1 ASC"),
("SELECT column_name_2, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_2 "
"ORDER BY column_name_2 ASC"),
("SELECT column_name_3, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_3 "
"ORDER BY column_name_3 ASC"),
("SELECT column_name_4, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_4 "
"ORDER BY column_name_4 ASC"),
("SELECT column_name_5, COUNT(*) "
"FROM schema.table "
"GROUP BY column_name_5 "
"ORDER BY column_name_5 ASC")]
start_time = time.time()
data_load = []
for queries in query_load:
data_load.append(pd.read_sql(queries, db_conn))
elapsed_time = time.time() - start_time
print ("Job finished in {} seconds".format(elapsed_time))
pool.map