0

I have a MS SQL Table as follows

Device ID       Timestamp               Avg_PF  THDV_Sum
863071010842661 2014-01-01 22:05:57     4.0       7.0   
865733020495321 2016-08-19 17:20:09     0.0       0.0  
865733020495321 2016-08-19 17:20:41     0.0       0.0   
865733020495321 2016-08-19 17:20:41     0.0       0.0 

There are 287,533 rows comprising data for 30 devices (i.e. there are 30 unique Device ID) at 10/15 mins interval. I want to retrieve data where TimeStamp date >=2018-10-01. In SSMS (SQL server 2014 Management Tool) I am able to do this easily using the following SQL

SELECT Device ID, Timestamp, Avg_PF, THDV_Sum 
FROM mytable
WHERE Timestamp >= '2018-10-01'

Now I am trying to the same on python using the following way

conn = pyodbc.connect('details of SQL server')
df_select = pd.read_sql_query(sql,conn)

Here I am using the above SQL statement as sql string. However, it is retrieving the entire data starting from timestamp = 2014-01-01. I think I need to modify the sql string in the pd.read_sql_query. My question is how can I add filter like stuffs in sql string which I can use in pd.read_sql_query.

1
  • What is the actual value of sql that you pass to the server? Commented Jan 3, 2019 at 5:57

2 Answers 2

2

I would go about it like this:

from sqlalchemy import create_engine
%%time -- just to measure

# Parameters
ServerName = "SQLSRV01" -- your input
Database = "Database"
Driver = "driver=SQL Server Native Client 11.0"

# Create the connection
engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database + "?" + Driver)

df = pd.read_sql_query ("SELECT Device ID, Timestamp, Avg_PF, THDV_Sum 
                         FROM mytable
                         WHERE Timestamp >= '2018-10-01'"
                       , engine)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks SQL_M. It worked. Actually, what I did was sql='SELECT Device ID, Timestamp, Avg_PF, THDV_Sum FROM mytable WHERE Timestamp >= "2018-10-01" ' i.e. just reverse the position of single quote by double quote.
0

Use the parse_dates argument of the read_sql_query function like so:

df_select = pd.read_sql_query(sql, conn, parse_dates=['Timestamp'])

2 Comments

parse_dates is applied after the query is executed.
I'm not sure about that. Could you provide some documentation for this assumption? Anyway, I think what is important for OP is what is retrieved after the query is ran and using the parse_dates argument as mentioned would get the job done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.