1

I want to read an excel file with multiple sheets in my Blob storage Azure Gen2 using Databrick pyspark. I already install the maven package. Below my code :

df = spark.read.format('com.crealytics.spark.excel') \
.option("header", "true") \
.option("useHeader", "true") \
.option("treatEmptyValuesAsNulls", "true") \
.option("inferSchema", "true") \
.option("sheetName", "sheet1") \
.option("maxRowsInMemory", 10) \
.load(file_path)    

Running this code I get this error:

Py4JJavaError: An error occurred while calling o323.load. : java.lang.NoClassDefFoundError: Could not initialize class com.crealytics.spark.excel.WorkbookReader$ at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:22) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:444) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:400) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:400) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

Any help is appreciated. Thanks

7
  • have you attached the library to a cluster? Commented Dec 8, 2021 at 16:23
  • Hello @AlexOtt, yes I already attached the library to a cluster and Notebook. Commented Dec 9, 2021 at 9:35
  • is it compiled for correct version of Scala that matches DBR version? Commented Dec 9, 2021 at 11:05
  • @AlexOtt in order to be sure I install both Scala2.11 and Scala2.12, and is not work. My DBR version"9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)". Commented Dec 9, 2021 at 14:55
  • 1
    @KarthikBhyresh-MT, Not yet. I use panda as alternative solution, but I hope solve this issues. Commented Dec 16, 2021 at 13:13

1 Answer 1

2

Can you verify if you have properly Mount an Azure Blob storage container.

Checkout official MS doc: Access Azure Blob storage using the RDD API

Hadoop configuration options are not accessible via SparkContext. If you are using the RDD API to read from Azure Blob storage, you must set the Hadoop credential configuration properties as Spark configuration options when you create the cluster, adding the spark.hadoop. prefix to the corresponding Hadoop configuration keys to propagate them to the Hadoop configurations that are used for your RDD jobs

Configure an account access key:

spark.hadoop.fs.azure.account.key.<storage-account-name>.blob.core.windows.net <storage-account-access-key>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.