Here's a sample of how your script will look like. We are doing 2 modifications:
conn_sql_server now takes these parameters:
- year: you can pass the year you want to replace
declare @year...
- where_clause: a
where clause of your choice
- before_clause_starts_with: the clause before which the
where clause should be placed
modify_query method that reads the contents of the file and changes the content based on the year you provided. If you provide the where clause, it'll put it before the clause you provide in before_clause_starts_with
import pyodbc
import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
for line in lines:
if year is not None:
if line.lower().startswith('declare @year int ='):
new_lines.append(f"DECLARE @Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
new_query = ''.join(new_lines)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= servername;'
'Database = databasename;'
'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
df = pd.read_sql_query(new_query, conn)
return df
df1 = conn_sql_server('C:/Users/JJ/SQL script1',
year=1999,
where_clause='WHERE itemNumber = 1002345',
before_clause_starts_with='group by')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3',
year = 1500)
Simulation
Let's run an example.
script1.sql
DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year
script2.sql
DECLARE @Year INT = 2022;
SELECT gross_sales
FROM sales.orders
order by @Year
script3.sql
DECLARE @Year INT = 2022;
SELECT GETDATE()
Using a script similar to the above, we'll try to see how each script looks like after it gets modified.
Simulation script
#import pyodbc
#import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
print('-------')
print('ORIGINAL')
print('-------')
print(lines)
for line in lines:
if year is not None:
if line.lower().startswith('declare @year int ='):
new_lines.append(f"DECLARE @Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
print('-------')
print('NEW')
print('-------')
new_query = ''.join(new_lines)
print(new_query)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
#conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
# 'Server= servername;'
# 'Database = databasename;'
# 'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
#df = pd.read_sql_query(new_query, conn)
#return df
#df1 = conn_sql_server('C:/Users/JJ/SQL script1')
#df2 = conn_sql_server('C:/Users/JJ/SQL script2')
#df3 = conn_sql_server('C:/Users/JJ/SQL script3')
df1 = conn_sql_server('script1.sql', year=1999, where_clause='WHERE itemNumber = 1002345', before_clause_starts_with='group by')
df2 = conn_sql_server('script2.sql')
df3 = conn_sql_server('script3.sql', year=1500)
Original query 1 was like this in script1.sql
['DECLARE @Year INT = 2022;\n', 'SELECT YEAR(date) @Year, \n', ' SUM(list_price * quantity) gross_sales\n', 'FROM sales.orders o\n', ' INNER JOIN sales.order_items i ON i.order_id = o.order_id\n', 'GROUP BY YEAR(date)\n', 'order by @Year']
After running the script, the query will become
DECLARE @Year INT = 1999
SELECT YEAR(date) @Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE itemNumber = 1002345
GROUP BY YEAR(date)
order by @Year
Query 3 used to look like this:
['DECLARE @Year INT = 2022;\n', 'SELECT GETDATE()']
It becomes
DECLARE @Year INT = 1500
SELECT GETDATE()
Give it a shot by changing the python script as you deem fit.
declare/setlines to your liking using Python. Similarly, you can add awhereclause in Python also. If you edit your answer and add a sample of filepath1 or filepath2 and how you want the query to be changed, I can try to assist.DECLAREand use a parameter for@Year? (I don't know how to do parameterized statements in pd, but doubtlessly it's possible.) Similarly, for dynamicWHEREs use something likeWHERE @itemNumber IS NULL OR itemNumber = @itemNumber, then set the@itemNumberparameter as appropriate.OPTION (RECOMPILE)comes in handy there to keep query plans that perform well. If at all possible you should avoid trying to tamper with the query text, as that's much more error prone.paramsargument that's used to pass parameter values by name. PyODBC doesn't support named parameters, so you'll have to use anonymous ones. If your query is egselect * from SomeTable where ID=?'you could use.read_sql(query,params=(123)).