0

I want to change my spark session from 'pyspark.sql.connect.dataframe.DataFrame' to 'pyspark.sql.dataframe.DataFrame' so that I can run StringIndexer and VectorAssembler.

If I run it in pyspark.sql.connect.dataframe.DataFrame, I'm getting an Assertion error with:

File /databricks/python/lib/python3.12/site-packages/pyspark/ml/wrapper.py:87, in JavaWrapper._new_java_obj(java_class, *args) 84 from pyspark.core.context import SparkContext 86 sc = SparkContext._active_spark_context ---> 87 assert sc is not None

Thank you!

1 Answer 1

2

Answering my own question just in case somebody googles it:

It's not possible in Databricks Community Edition, since they only give a SQL warehouse compute and .connect is the default session. In the paid version, change the session from SQL warehouse to a different compute.

P.S. ML functions are not available in .connect Databricks Spark. You can use ML only through scikit-learn.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.