0

I ran this code and have an error.

    import pandas as pd
    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder.\
    config("spark.jars.repositories", "https://repos.spark-packages.org/").\
    config("spark.jars.packages", "saurfang:spark-sas7bdat:2.0.0-s_2.11,org.apache.hadoop:hadoop-aws:2.7.0").\
    enableHiveSupport().getOrCreate()
    
    df_spark_temp = spark.read.format('com.github.saurfang.sas.spark').load('18-83510-I94-Data-2016/i94_apr16_sub.sas7bdat')
    df_spark_temp.limit(5).toPandas().show()
py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
: java.lang.NoClassDefFoundError: scala/Product$class
        at com.github.saurfang.sas.spark.SasRelation.<init>(SasRelation.scala:48)
        at com.github.saurfang.sas.spark.SasRelation$.apply(SasRelation.scala:42)
        at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:50)
        at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:39)
        at com.github.saurfang.sas.spark.DefaultSource.createRelation(DefaultSource.scala:27)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
        ... 23 more

Python version: 3.9.6
JAVA version: 17.0.4.1
Pyspark version: 3.3

I searched for same issues in stack overflow, and most of them said it is may because the scala version.
I haven't installed scala before, do I need to install scala or I can change the setting in JAVA?

And I type PySpark --version and it shows

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
      /_/
                        
Using Scala version 2.12.15, Java HotSpot(TM) 64-Bit Server VM, 17.0.4.1

Does it mean I need to install scala version 2.12.15 or I already have installed?

1 Answer 1

2

All libraries must be compiled for the same Scala version you are running with.

I'm not familiar with PySpark but I see that at least spark-sas7bdat:2.0.0-s_2.11 seems to be compiled for Scala 2.11 given its version number.

If you're running with Scala 2.12, look for using spark-sas7bdat:3.0.0-s_2.12 instead.

Personal note: this library seems to not be maintained at all, consider using another one if that's for production code.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Gael! It works when using spark-sas7bdat:3.0.0-s_2.12. The library you mean https://repos.spark-packages.org/ this repo? I only do this for studying, but I will take a note. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.