Python in Databricks

Question

How to even start a basic query in databricks using python?

The data I need is in databricks and so far I have been using Juypterhub to pull the data and modify few things. But now I want to eliminate a step of pulling the data in Jupyterhub and directly move my python code in databricks then schedule the job.

I started like below

%python
import pandas as pd
df = pd.read_sql('select * from databasename.tablename')

and got below error

TypeError: read_sql() missing 1 required positional argument: 'con'

So I tried update

%python
import pandas as pd
import pyodbc

odbc_driver = pyodbc.drivers()[0]
conn = pyodbc.connect(odbc_driver) 

df = pd.read_sql('select * databasename.tablename', con=conn)

and I got below error

ModuleNotFoundError: No module named 'pyodbc'

Can anyone please help? I can use sql to pull the data but I already have a lot of code in python that I dont know to convert in sql. So I just want my python code to work in databricks for now.

you're confusing pandas with pyspark.

Kashyap
– Kashyap

2023-01-10 17:26:56 +00:00
Commented Jan 10, 2023 at 17:26 — Kashyap
– Kashyap, Commented Jan 10, 2023 at 17:26

pygri · Accepted Answer · 2023-01-10 16:32:17Z

1

You should use directly spark's SQL facilities:

my_df = spark.sql('select * FROM databasename.tablename')

answered Jan 10, 2023 at 16:32

pygri

6876 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python in Databricks

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related