0

I am very new to PandaSQL and have never used it before. Here is my code up until now:

import pandas as pd
from pandasql import sqldf
import numpy as np

tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252')
query = """SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) 
FROM TASKS 
GROUP BY Work Item Id, Parent Work Item Id;"""

df = sqldf(query, locals()))
print(df.head(5))

I am getting this error:

'pandasql.sqldf.PandaSQLException: (sqlite3.OperationalError) near "Id": syntax error [SQL: 'SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) \n'

Any help would be great!

Edit: After implementing some suggestions from other users below, here is my working code:

import pandas as pd
from pandasql import sqldf
import numpy as np
tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252',  low_memory=False)

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM tasks 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

print(sqldf(query, locals()))
2
  • 1
    it looks like the problem is your select statement and you probably will have problems with the GROUP BY statement as well. I would test firsts with SELECT * FROM tasks. I am guessing the column names need to follow snake formatting: work_item_id Commented Jun 12, 2018 at 18:36
  • @Chris Thanks for getting back. I did exactly that as another user mentioned below and I am getting another error for some reason. Commented Jun 12, 2018 at 18:41

1 Answer 1

2

If you have column names that contain spaces, you have to quote them to make the SQL valid:

query = """SELECT `Work Item Id`, `Parent Work Item Id`, MAX(`Remaining Work`) 
FROM TASKS 
GROUP BY `Work Item Id`, `Parent Work Item Id`;"""

or

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM TASKS 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

In dependence of what flavor PandaSQL expects.

Sign up to request clarification or add additional context in comments.

8 Comments

This seemed to have worked. But I am still getting this message and am unsure of what to do: sys:1: DtypeWarning: Columns (32) have mixed types. Specify dtype option on import or set low_memory=False.
@rmahesh check this post out: stackoverflow.com/questions/24251219/…
@rmahesh - That's unrelated with the SQL itself, it means that your CSV file has mixed data types in some of its columns, you'll have to tell Pandas how to convert those columns. Read this answer for more info.
@zwer I have made an edit with the current code. I am still getting errors numerous different errors.
@rmahesh - Without a sample of your data and traceback of those errors it would be impossible for us to determine why they occur. They are certainly not related to this question which, I believe, is resolved by my answer above so create a new question to deal with further issues you might encounter.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.