0

I’m working on Databricks with a cluster running Runtime 16.4, which includes Spark 3.5.2 and Scala 2.12.

For a specific need, I want to implement my own custom way of writing to Delta tables by manually managing Delta transactions from PySpark. To do this, I want to access the Delta Lake transactional engine via the JVM embedded in the Spark session, specifically by using the class:

org.apache.spark.sql.delta.DeltaLog

Issue

When I try to use classes from the package org.apache.spark.sql.delta directly from PySpark (through spark._jvm), the classes are not found if I don’t have the Delta Core package installed explicitly on the cluster.

When I install the Delta Core Python package to gain access, I encounter the following Python import error:

ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package

Without the Delta Core package installed, accessing DeltaLog simply returns a generic JavaPackage object that is unusable.

What I want to do Access the Delta transaction log API (DeltaLog) from PySpark via JVM.

Be able to start transactions and commit manually to implement custom write behavior.

Work within the Databricks Runtime 16.4 environment without conflicts or missing dependencies.

Questions

How can I correctly access and use org.apache.spark.sql.delta.DeltaLog from PySpark on Databricks Runtime 16.4?

Is there a supported way to manually manage Delta transactions through the JVM in this environment?

What is the correct setup or package dependency to avoid the ModuleNotFoundError when installing the Delta Core Python package?

Are there any alternatives or recommended patterns to achieve manual Delta commits programmatically on Databricks?

3
  • I think it would help you get answers quickly if you post this question in Databricks community as well. community.databricks.com Commented Jun 19 at 4:44
  • Haven't used Databricks runtime before, but how did installed your Delta Core package? Did you add the jar name in your Spark config, or you download the jar from repo (e.g. Maven) and put it into a specific location when you launch a Spark App? Commented Jun 20 at 5:28
  • I just installed the delta-core from the Maven interface on my cluster Commented Jun 20 at 8:23

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.