4

I am trying to create a methodology for passing parameters automatically through something like locals(), similarly to how f-strings work.

How it currently works

import pandas as pd

def my_func(conn, string_id, date, integer_ids):
    sql = f"""    
    select * from TABLE a
    where STRING_ID = '{string_id}'
    and DATE = {date}
    and INTEGER_ID in ({','.join(map(str, integer_ids))})"""
    df = pd.read_sql(sql, conn)
    return df

However, this approach means I cannot copy-paste the SQL into SQL developer or similar, and run it from there. So I would like an approach that makes use of parameters instead.

There seems to be two problems with that

  1. Parameters must be literals, so its not possible to pass along lists
  2. I need to create a dictionary manually, and cannot simply pass something like locals()

How I would like it to work would be something like the example below (which obviously doesn't work)

import pandas as pd

def my_func(conn, string_id, date, integer_ids):
    sql = """    
    select * from TABLE
    where STRING_ID = :string_id
    and DATE = :date
    and INTEGER_ID in :integer_ids"""
    df = pd.read_sql(sql, conn, params=locals())
    return df

EDIT: Thanks to perl, I now have a working solution to my problem

def read_sql(sql, conn, params):
    # Finds all words following a ":" sign in the sql
    for p in re.findall(':(\w+)', sql):
        if isinstance(params.get(p), (tuple, list)):
            ext_params = {f'{p}_{i:03d}': p_i for i, p_i in enumerate(params.get(p))}
            sql = sql.replace(f':{p}', f"(:{', :'.join(ext_params)})")
            params.update(ext_params)

    sql_text = sqlalchemy.text(sql)
    return pd.read_sql(sql_text, conn, params=params)


def my_func(conn, string_id, date, integer_ids):
    sql = """    
    select * from TABLE
    where STRING_ID = :string_id
    and DATE = :date
    and INTEGER_ID in :integer_ids"""
    df = read_sql(sql, conn, locals())
    return df

EDIT2: For anyone finding this question, I have since then extended the solution a bit to cover issues where lists longer than 1000 elements are passed

def generate_sql(sql: str, params: dict = None, param_key: str = ':') -> List[Tuple[sqlalchemy.text, dict]]:
    if params is None:
        params = dict()
    max_sql_params = 1000

    out = []
    # Finds all words following a ":" sign in the query
    for p in set(re.findall(f"{param_key}(\w+)", sql)):
        if isinstance(params.get(p), (tuple, list, np.ndarray)):
            # Recursively call function for variables with more than 1000 elements
            if len(params[p]) > max_sql_params:
                new_params = params.copy()  # NB: Shallow copy sufficient as param keys are tuples, lists or arrays
                new_params[p] = params[p][max_sql_params:]
                out.extend(generate_sql(sql=sql, params=new_params, param_key=param_key))
            extra_params = {f"{p}_{i:03d}": p_i for i, p_i in enumerate(params[p][:max_sql_params])}
            sql = sql.replace(f":{p}", f"(:{', :'.join(extra_params)})")
            params.update(extra_params)

    sql_text = sqlalchemy.text(sql)
    out.append((sql_text, params))
    return out


def read_sql(sql: str, conn: sqlalchemy.engine, params: dict = None) -> pd.DataFrame:
    sql_tuples = generate_sql(sql=sql, params=params)
    df = pd.concat(pd.read_sql(sql=s, con=conn, params=p) for s, p in sql_tuples)

    return df
2
  • would return df, sql workl? at least you'd get the sql with the parameters into a string you can print() and cut/paste? Commented May 27, 2021 at 15:53
  • @JonathanLeon that would require i run the code to get a string that can be copy pasted, i need something that works without running the code Commented May 27, 2021 at 18:37

2 Answers 2

2
+100

You can use parametrized queries by wrapping the query in sqlalchemy.text and converting lists to tuples. For example:

def my_func(conn, min_number, letters):
    # convert lists to tuples
    letters = tuple(letters)
    
    # wrap sql in sqlalchemy.text
    sql = sqlalchemy.text("""    
        SELECT *
        FROM letters
        WHERE
            number >= :min_number AND
            letter in :letters""")
    
    # read and return the resulting dataframe
    df = pd.read_sql(sql, conn, params=locals())
    return df

my_func(conn, 10, ['a', 'b', 'c', 'x', 'y', 'z'])

Output:

  letter  number
0      x      23
1      y      24
2      z      25

For completeness of the example, the following was used as a test table:

df = pd.DataFrame({
    'letter': list(string.ascii_lowercase),
    'number': range(len(string.ascii_lowercase))})
df.to_sql('letters', conn, index=False)

Update: Here's a possible workaround for Oracle to make it work with lists:

def get_query(sql, **kwargs):
    for k, v in kwargs.items():
        vs = "','".join(v)
        sql = sql.replace(f':{k}', f"('{vs}')")
    return sql

def my_func(conn, min_number, letters):
    sql_template = """    
        SELECT *
        FROM letters
        WHERE
            number >= :min_number AND
            letter in :letters
    """
    # pass list variables to `get_query` function as named parameters
    # to get parameters replaced with ('value1', 'value2', ..., 'valueN')
    sql = sqlalchemy.text(
        get_query(sql_template, letters=letters))
    
    df = pd.read_sql(sql, conn, params=locals())
    return df

my_func(conn, 10, ['a', 'b', 'c', 'x', 'y', 'z'])

Update 2: Here's the get_query function that works with both strings and numbers (enclosing in quotes strings, but not numbers):

def get_query(sql, **kwargs):
    # enclose in quotes strings, but not numbers
    def q(x):
        q = '' if isinstance(x, (int, float)) else "'"
        return f'{q}{x}{q}'
    
    # replace with values
    for k, v in kwargs.items():
        sql = sql.replace(f':{k}', f"({','.join([q(x) for x in v])})")

    return sql

For example:

sql = """    
SELECT *
FROM letters
WHERE
    number in :numbers AND
    letters in :letters
"""

get_query(sql,
          numbers=[1, 2, 3],
          letters=['A', 'B', 'C'])

Output:

SELECT *
FROM letters
WHERE
    number in (1,2,3) AND
    letters in ('A','B','C')
Sign up to request clarification or add additional context in comments.

14 Comments

Apparently it's not supported in Oracle driver the same way it is in Postgres (see here). The workaround that they recommend is generating the sql in code, which is of course what we're trying to avoid here...
I see, so perhaps there is no solution to my issue then. That is unfortunate.. If I don't get another solution I'll accept your answer - thanks
And if the overall idea works for you, we can improve get_query to dynamically check for numeric types that don't need to be enclosed in quotes and generate the list of values accordingly
that looks nice, yeah a dynamic get_query that replaces list variables automatically might do the trick
@oskros Cool, see update 2 for updated get_query function that can handle numeric types as well
|
1

why not this :

import pandas as pd

def my_func(conn, string_id, date, integer_ids):
    sql = """    
    select * from RISK
    where STRING_ID = %s
    and DATE = %s
    and INTEGER_ID in %s"""
    df = pd.read_sql(sql, conn, (string_id, date,integer_ids))
    return df

7 Comments

because it cant be copy pasted into an SQL interpreter
@oskros i'm not following , copy pasting what?
copy pasting the SQL string directly from the .py file into an SQL interpreter - the %s will give a syntax error
I see , but what would you want that?
I want SQLs stored in separate .sql files, which are then executed either directly through an SQL interpreter by inputting parameter values, or loaded by python to run in a script - this is needed as some of my companys code runs with pl/sql and other stuff is coded in python
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.