Error with query in pandasql

Question

I am very new to PandaSQL and have never used it before. Here is my code up until now:

import pandas as pd
from pandasql import sqldf
import numpy as np

tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252')
query = """SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) 
FROM TASKS 
GROUP BY Work Item Id, Parent Work Item Id;"""

df = sqldf(query, locals()))
print(df.head(5))

I am getting this error:

'pandasql.sqldf.PandaSQLException: (sqlite3.OperationalError) near "Id": syntax error [SQL: 'SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) \n'

Any help would be great!

Edit: After implementing some suggestions from other users below, here is my working code:

import pandas as pd
from pandasql import sqldf
import numpy as np
tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252',  low_memory=False)

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM tasks 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

print(sqldf(query, locals()))

it looks like the problem is your select statement and you probably will have problems with the GROUP BY statement as well. I would test firsts with SELECT * FROM tasks. I am guessing the column names need to follow snake formatting: work_item_id — It_is_Chris
– It_is_Chris, Commented Jun 12, 2018 at 18:36
@Chris Thanks for getting back. I did exactly that as another user mentioned below and I am getting another error for some reason. — rmahesh
– rmahesh, Commented Jun 12, 2018 at 18:41

zwer · Accepted Answer · 2018-06-12 18:38:14Z

2

If you have column names that contain spaces, you have to quote them to make the SQL valid:

query = """SELECT `Work Item Id`, `Parent Work Item Id`, MAX(`Remaining Work`) 
FROM TASKS 
GROUP BY `Work Item Id`, `Parent Work Item Id`;"""

or

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM TASKS 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

In dependence of what flavor PandaSQL expects.

answered Jun 12, 2018 at 18:38

zwer

25.9k3 gold badges53 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

rmahesh Over a year ago

This seemed to have worked. But I am still getting this message and am unsure of what to do: sys:1: DtypeWarning: Columns (32) have mixed types. Specify dtype option on import or set low_memory=False.

It_is_Chris Over a year ago

@rmahesh check this post out: stackoverflow.com/questions/24251219/…

zwer Over a year ago

@rmahesh - That's unrelated with the SQL itself, it means that your CSV file has mixed data types in some of its columns, you'll have to tell Pandas how to convert those columns. Read this answer for more info.

rmahesh Over a year ago

@zwer I have made an edit with the current code. I am still getting errors numerous different errors.

zwer Over a year ago

@rmahesh - Without a sample of your data and traceback of those errors it would be impossible for us to determine why they occur. They are certainly not related to this question which, I believe, is resolved by my answer above so create a new question to deal with further issues you might encounter.

|

Collectives™ on Stack Overflow

Error with query in pandasql

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related