I am trying to perform SQL-language functions on Python Dataframes as if they were tables in Microsoft SQL Server. Seraching around, it appears that R has the module sqldf, and Python has some record of compatibility with pandasql - however I am unable to get Rodeo to work, if that is a requirement.
This Blog has records of the above. I am unable to import sqldf or pandasql by running any combination of
import pandasql as pdsql
from pandasql import sqldf
pysql = lambda q: pdsql.sqldf(q, globals())
which I scavenged from here and there.
In SAS, you are able to manipulate SAS datasets using PROC SQL as such:
PROC SQL;
SELECT
b.patid,
CASE WHEN ECD='1234' THEN 'ACTIVE' ELSE 'ACTIVE' END AS ACTIVE_INACTIVE,
b.SUMMARY_ID
FROM SAStable1 a
LEFT JOIN SAStable2 b
ON a.patient_id=b.patid
;
QUIT;
This results in being able to type a SQL query on SAS datasets in SAS. This is different than pandas.read_sql_query(query, connection) function, which works great on running SQL queries on connected databases, but not in actual dataframes once they are in Python (unless I am missing something).
Is there anything like this for Python? Given that its available in SAS and R, I would be surprised, but my searches yield nothing actionable.
Thanks!