converting python code to python spark code

Question

Below code is in Python and i want to convert this code to pyspark, basically i'm not sure what will be the codefor the statement - pd.read_sql(query,connect_to_hive) to convert into pyspark

Need to extract from data from the EDL, so making the connection to the EDL using PYODBC and them extract the data using sql query.

pyodbc connection to the Enterprise Data Lake:

connect_to_hive = pyodbc.connect("DSN=Hive", autocommit=True)
transaction=pd.read_sql(query, connect_to_hive)
connect_to_hive.close()

#Query function: Below is just a basic sql query to replicate this problem.

query=f'''
with trans as (
    SELECT
        a.employee_name,
        a.employee_id
    
    FROM EMP
'''

Yayati Sule · Accepted Answer · 2021-04-21 17:06:23Z

2

+50

The above code can be converted to SparkSQL code as follows:

spark = SparkSession.builder.enableHiveSupport().getOrCreate()

query=f'''
with trans as (
    SELECT
        a.employee_name,
        a.employee_id
    
    FROM EMP
'''

employeeDF = spark.sql(query)

employeeDF.show(truncate=False)

The query would be run as is on Hive and the result shall be available to you as a Spark DataFrame

answered Apr 21, 2021 at 17:06

Yayati Sule

1,62914 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Yayati Sule Over a year ago

@Robby Star Can you share the bounty?

Robby star Over a year ago

Thanks Mate, will be able to share it in next 5 mins.

Collectives™ on Stack Overflow

converting python code to python spark code

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related