I'm building a Python function that creates SQL statements as a string that will later be passed to the database to extract data using a pyodbc connection.
I have trouble finding the right way to handle different combinations of filter arguments in both SELECT and WHERE statement without using multiple if/else statements and without breaking the syntax of the SQL.
Here is a minimal reproducible example:
sqldata.py
import pandas as pd
class SqlData(object):
def __init__(self, customers=None, last_version=None, attributes=None, age=None):
self.customers = customers
self.last_version = last_version
self.attributes = attributes
self.default_columns = ["customer_id", "date", "value"]
self.age = age
def buildSQL(self):
default_columns = ", ".join(self.default_columns)
if self.attributes == None:
attributes = " "
else:
attributes = " ," + ", ".join(self.attributes) + " "
if self.customers == None:
customers = " "
else:
customers = "customer_id in ('" + "','".join(self.customers) + "')"
if self.last_version == None:
last_version = " "
else:
last_version = " AND last_version = 'Y'"
if self.age == None:
age = " "
else:
age = " AND age > " + self.age
self.sql = "SELECT " + default_columns + attributes + \
"FROM myTable " + \
"WHERE " + customers + last_version + age
return self.sql
__init__.py
from sqldata import SqlData
__main__.py
from sqldata import SqlData
data = SqlData(customers = ["Arthur", "Ben"],
last_version = True,
attributes = ["name", "age"],
age = "14")
print(data.buildSQL())
If all arguments are given it works just fine and I get the following:
SELECT customer_id, date, value ,name, age
FROM myTable
WHERE customer_id in ('Arthur','Ben') AND last_version = 'Y' AND age > 14
But if customers is not given, it makes no more sense:
SELECT customer_id, date, value ,name, age
FROM myTable
WHERE AND last_version = 'Y' AND age > 14
Ideally, when customers is not given, I would like to have the following:
SELECT customer_id, date, value ,name, age
FROM myTable
WHERE last_version = 'Y' AND age > 14
All in all, I would like to know the following:
- Is it possible to "bypass" the use of those multiple if/else statements ? There must be a better way.
- How to handle the case where the
customersvariable is not given, hence breaking the syntax of the SQL statement?
customersare given. Why don't you use the same test to determine whether you should outputANDor not?customers,last_version, andageare allNone? then you shouldn't have a WHERE clause at all. I don't really have an answer for you, but my suggestion is you should separate your code to smaller functions, that deal with the different clauses of the query separately, then create a function that combines them in the correct manner