Executing SQL on Pandas Dataframe and storing results in same Dataframe

Question

I have a data frame that looks like the image above. What I want to do is loop through the SQL statements under SQL_SCRIPT, execute them, and store the results in the next column over which would be called 'RESULTS'. When I just try and execute it (without storing it anywhere) it runs fine, but when I try and store the results in a new dataframe column it errors out with:

ValueError: cannot set a row with mismatched columns

Here is the code:

def run_tests(self):
    s = self.connection()
    df = self.retrieve_sql()
    df_type = df.loc[df['STEP_TYPE'] == 'T']
    df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
    print(df_to_list)
    for sql_script in df_to_list['SQL_SCRIPT']:
        df_to_list.loc['RESULTS'] = pd.read_sql(sql_script,s)
    print(df_to_list)

Instead of read_sql I have also tried just using the session execute, which also works but I'm not sure how to store the results to the dataframe going that route:

def run_tests(self):
    s = self.connection()
    df = self.retrieve_sql()
    df_type = df.loc[df['STEP_TYPE'] == 'T']
    df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
    print(df_to_list)
    for sql_script in df_to_list['SQL_SCRIPT']:
        s.execute(sql_script)

Here is the connection function, if needed:

def connection(self):
    con = self.load_json_file()
    cfg_dsn = con['config']['dsn']
    cfg_usr = con['config']['username']
    cfg_pwd = con['config']['password']

    udaExec = teradata.UdaExec(appName="DataAnalysis", version="1.0", logConsole=False)
    session = udaExec.connect(method="odbc", dsn=cfg_dsn, username=cfg_usr, password=cfg_pwd)

    return session

Parfait · Accepted Answer · 2019-02-09 00:29:55Z

2

Consider running Series.apply on the column of SQL strings.

def run_tests(self):
    s = self.connection()
    c = s.cursor()              # OPEN CURSOR
    df = self.retrieve_sql()

    df_type = df.loc[df['STEP_TYPE'] == 'T']
    df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
    print(df_to_list)

    # NEW METHOD TO RUN QUERY
    def sql_run(x):   
        c.execute(x)
        if c.rowcount > 0:
           res = c.fetchone()[0]
        else:
           res = np.nan
        return res

    df_to_list['RESULTS'] = df_to_list['SQL_SCRIPT'].apply(sql_run)
    print(df_to_list)

edited Feb 9, 2019 at 0:29

answered Feb 8, 2019 at 20:54

Parfait

108k19 gold badges102 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JD2775 Over a year ago

Thank you @Parfait. This is returning 'NoneType' object is not subscriptable'. I think this may be because some SELECT queries aren't count(*) and they return nothing (not 0) and that is expected. Is there a way to handle this? By putting a default value in the dataframe cell instead? I am not sure. Sorry, I didnt include that in the original question but I think that may be the issue?

Parfait Over a year ago

Add a handler in apply function. See update with if checking cursor row count.

JD2775 Over a year ago

Thank you Parfait! That worked perfectly. Very, very much appreciated.

Collectives™ on Stack Overflow

Executing SQL on Pandas Dataframe and storing results in same Dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related