How to with Python/Pandas for-loop read SQLite-file with query-parameter in the for-loop?

Question

I want to extract a number of tables from an SQLite database. The tables have different number of rows and therefore it is natural to store them in a Python list to facilitate further data analysis . The following code works.

    import sqlite3
    import pandas as pd
    conn = sqlite3.connect("Database")
    data = []
    data.append(pd.read_sql("""SELECT ID,Time,A,B FROM Main WHERE BatchID=='BATCH1'""", conn))
    data.append(pd.read_sql("""SELECT ID,Time,A,B FROM Main WHERE BatchID=='BATCH2'""", conn))
    conn.close()
    print(data[0]['Time'])

Instead of repeating the code for each BatchID it would be convenient to have a for-loop, something like

    conn = sqlite3.connect("Database")
    data = []
    batch = ['BATCH1', 'BATCH2']
    for k in list(range(2)): 
       data.append(pd.read_sql("""SELECT ID,Time,A,B FROM Main WHERE BatchID='eval(batch[k])'""", conn))                                       
    conn.close()
    print(data[0]['Time'])

But this does not work. If I try to read only one table with this technique and writing explicitly eval(batch[0]) then I get a table with only the keys, but no data.

On request I add some context to why I have a list of DataFrames. What I typically want to do is to easily plot a diagram with function how A varies with Time for different batches. The set of batches of interest can be a specific batch, or a set of batches, or all. The code for the plot should be simple and transparent.

    for k in batches: ax1.plot(data[k]['Time'], data[k]['A'])

But this command-line can perhaps be simple using selection in a DataFrame of all batches with selected variables. I thought also here is a conceptually simplicity that we have a list of plots that we with the command above overlay in the same diagram.

I also like to make computations of subsets of data in a simliar way.

An alternative approach suggested below by JPI93 is to simplify the first step and make a large DataFrame containing data from all batches with the selected variables. This leads to a somewhat longer command to make the desired diagram I think. Below the code

    ...
    data = pd.read_sql("""SELECT BatchID,ID,Time,A,B FROM Main""",conn)
    index = []
    index.append(data['BatchID'] == ' Batch1']
    index.append(data['BatchID'] == ' Batch2']
    batches = list(range(2))

Then we can plot with the following command

    for k in batches:ax1.plot(data.loc[index[k],'Time'],data.loc[index[k],'A'])

I tend to favour the original plot command above, but then I need to solve the original problem of making a list of DataFrames. Or is here some other approach to make the plot command simple and readable?

JPI93 · Accepted Answer · 2020-12-16 16:09:49Z

Given the following starting code used to test solutions provided below:

import sqlite3
import pandas as pd

conn = sqlite3.connect(':memory:')
c = conn.cursor()

with conn:
   c.execute('''
CREATE TABLE IF NOT EXISTS Main(
  ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
  Time TEXT NOT NULL,
  A TEXT NOT NULL,
  B TEXT NOT NULL,
  BatchID TEXT NOT NULL);''')
   c.execute('''
INSERT INTO Main (Time, A, B, BatchID) VALUES
  ('08:15:00', 'Atext1', 'Btext1', 'BATCH1'),
  ('08:30:00', 'Atext2', 'Btext2', 'BATCH2'),
  ('08:30:45', 'Atext3', 'Btext3', 'BATCH3'),
  ('25:15:50', 'Atext1.1', 'Btext1.1', 'BATCH1'),
  ('18:30:60', 'Atext2.1', 'Btext2.1', 'BATCH2'),
  ('00:04:45', 'Atext3.1', 'Btext3.1', 'BATCH3');''')

batch = ['BATCH1', 'BATCH2']

There are a few ways that you could tackle the problem of creating a list of pandas.DataFrame objects reflecting the desired values from the Main table delimited by Main.BatchID.

Solution 1

This solution uses a similar approach to that hinted at in your original post, making use of Python F-strings to inject values from batch into each query used to populate data.

data = [pd.read_sql(f"""SELECT ID,Time,A,B FROM Main WHERE BatchID='{b}'""", conn) for b in batch]

Solution 2

This solution only queries the database once, returning all values from Main. It then filters the resultant df based on batch values to populate data as required.

df = pd.read_sql('SELECT ID, Time, A, B, BatchID FROM Main', conn)
data = [df[df['BatchID'] == b].iloc[:, df.columns != 'BatchID'] for b in batch] # New df for each BatchID in batches with BatchID column ommited in output as per OP 
data = [d for d in data if d.shape[0] > 0] # Filter out any 0 row results if present (i.e. those where a value in batch is not present in Main.BatchID)

Questioning the Question

It seems like there is a fair likelihood that this question may actually be an XY Problem, of course I may be off.

The main reason for my thinking this is that it seems superfluous to create a list of separate pandas.DataFrame instances to essentially tackle the problem of filtering results. pandas provides functionality for such filtering on a single pandas.DataFrame (as illustrated in Solution 2), in a manner potentially much more efficient and less cumbersome when it comes to later analysis.

It might be worth checking out this documentation on selecting subsets before committing to a solution of the question asked over a different approach.

I added context to the problem, see bottom of the post. I need to think about your suggestion of keeping a DataFrame with all batches and selected variables.
With your example database your code works for me. Modifying so that A contains REALS and put in some numbers I can even make a diagram in my preferred way. And I go for solution 1. Seems I have some delicate problem with my own database, somehow.

Quang Hoang · Accepted Answer · 2020-12-16 14:25:43Z

1

If you are on Python 3.6+, try formatting the script:

conn = sqlite3.connect("Database")
data = []
batch = ['BATCH1', 'BATCH2']
for k in list(range(2)):
   query = batch[k] 
   data.append(pd.read_sql(f"""SELECT ID,Time,A,B FROM Main WHERE BatchID='{query}'""", conn))                                       
conn.close()
print(data[0]['Time'])

Or you can use the old style format function:

    data.append(pd.read_sql(f"""SELECT ID,Time,A,B 
                                FROM Main 
                                 WHERE BatchID='{}'""".format(query),
                            conn))

answered Dec 16, 2020 at 14:25

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

4 Comments

janpeter Over a year ago

I use Python 3.6.4. Tried your first script and get tables with only the keys, no content. The code at the top of my post brings tables with contents also. Any idea?

Quang Hoang Over a year ago

It's strange since BatchID=='BATCH1' is not a valid SQL syntax while BatchID='BATCH1' is. Try to change my code to BatchID=='{query}'.

janpeter Over a year ago

Both gives the same result - a list of empty dataframes with only the keys (correctly). By the way I do the work in Jupyter notebook.

janpeter Over a year ago

When I run your code with JPI93 example database then your code also works fine! Seems I have a delicate problem with my own database.

Collectives™ on Stack Overflow

How to with Python/Pandas for-loop read SQLite-file with query-parameter in the for-loop?

2 Answers 2

Solution 1

Solution 2

Questioning the Question

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Solution 1

Solution 2

Questioning the Question

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related