1

I have a python script where I am trying to loop through multiple sql queries, take the outputs of those queries as simple count(*)'s, and concatenate them into one dataframe.

def function(query):

conn = ...

sql_query = open(query, 'r').read()

df_sql_output = pd.read_sql_query(sql_query,conn)
concatenated_data = []
if not df_sql_output.empty:
        concatenated_data.append(df_sql_output)
concatenated_data = pd.concat(concatenated_data, axis=1)
print(concatenated_data)
sqls = ['test1.sql','test2.sql','test3.sql']

for i in sqls:
    function(i)
2
  • 1
    What is your question? Commented Aug 29, 2018 at 14:19
  • @roganjosh when I run this script it is not concatenating this into on df. Rather it is printing a different output for each query in the loop. So I am wondering how I can better loop through queries and concatenate the outputs into one dataframe. Commented Aug 29, 2018 at 14:22

2 Answers 2

1

This is just basic Python really. Your code isn't clear because you haven't indented it properly here, but in any case it can't work because concatenated_data is declared within the scope of function and is lost when the function exits. There's no way for its effects to persist. You need to learn how functions work. The Python tutorial is the place to look: https://docs.python.org/3/tutorial/controlflow.html#defining-functions. In the meantime here is roughly how to fix your code worrying about wrapping anything in a function:

conn = ...

query_paths = ['test1.sql', 'test2.sql', 'test3.sql']

query_results = []

for query_path in query_paths:

    with open(query_path) as file:
        query = file.read()

    query_results.append(pd.read_sql_query(query, conn))

result = pd.concat(query_results)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. This was super helpful. I am on week two of learning Python so I will check out that tutorial you recommended.
0

This is the rough idea how can we do it. If you update your code. I can update my answer.

import pandas as pd


sqls = ['test1.sql','test2.sql','test3.sql']

def func(query):
    conn = ...
    df = pd.DataFrame()
    if len(df) == 0:
        df = pd.read_sql(query_string,conn)
    else:
        df_temp = df.copy()
        temp = pd.read_sql(query_string,conn) 
        df = pd.concat([df, df_temp], ignore_index=True)
    return df

for query in sqls:
    func(query)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.