1

How to even start a basic query in databricks using python?

The data I need is in databricks and so far I have been using Juypterhub to pull the data and modify few things. But now I want to eliminate a step of pulling the data in Jupyterhub and directly move my python code in databricks then schedule the job.

I started like below

%python
import pandas as pd
df = pd.read_sql('select * from databasename.tablename')

and got below error

TypeError: read_sql() missing 1 required positional argument: 'con'

So I tried update

%python
import pandas as pd
import pyodbc

odbc_driver = pyodbc.drivers()[0]
conn = pyodbc.connect(odbc_driver) 

df = pd.read_sql('select * databasename.tablename', con=conn)

and I got below error

ModuleNotFoundError: No module named 'pyodbc'

Can anyone please help? I can use sql to pull the data but I already have a lot of code in python that I dont know to convert in sql. So I just want my python code to work in databricks for now.

1
  • 2
    you're confusing pandas with pyspark. Commented Jan 10, 2023 at 17:26

1 Answer 1

1

You should use directly spark's SQL facilities:

my_df = spark.sql('select * FROM databasename.tablename') 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.