I am trying to write a query in Python using pandasql. My code is as below,
import pandas as pd
from pandasql import *
data = pd.read_csv('registerlog.csv')
q = """
SELECT
a.RegistrationMonth, COUNT(DISTINCT a.UserID) AS UserSize,
COUNT(
CASE a.MonthDifference
WHEN 0.0 THEN DISTINCT a.UserID ELSE NULL
END
) AS MonthZero
FROM
data) AS a
GROUP BY
a.RegistrationMonth
"""
print sqldf(q, locals())
But this gives the following error,
print sqldf(q, locals()) File "C:\Python27\lib\site-packages\pandasql\sqldf.py", line 156, in sqldf return PandaSQL(db_uri)(query, env) File "C:\Python27\lib\site-packages\pandasql\sqldf.py", line 63, in call raise PandaSQLException(ex) PandaSQLException: (sqlite3.OperationalError) near "DISTINCT": syntax error
But if I use WHEN 0.0 THEN a.user_id ELSE NULL then it works. Also the normal way of COUNT(DISTINCT a.user_id) also works fine.
But I want to get only the DISTINCT values inside the CASE. Is there a way to achieve this to get the DISTINCT count value inside the CASE?
COUNT(DISTINCT (CASE ... END)) AS MonthZero?