1

I am developing a python package which will be deployed into databricks cluster. We often need reference to the "spark" and "dbutils" object within the python code.

We can access these objects easily within Notebook using "spark" (like spark.sql()). How do we get the spark instance within the python code in the package?

1 Answer 1

1

SparkSession.Builder.getOrCreate:

Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder.

This method first checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default

So whenever you need instance of SparkSession and don't want to pass it as an argument:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.