1

I am trying to use Spark SQL from a java program, Where the dependency in pom.xml points to Spark version 1.6.0. Below is the program

package spark_test;

import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.hive.HiveContext;

public class MyTest {
private static SparkConf sparkConf;

public static void main(String[] args) {        
    String warehouseLocation = args[0];
    sparkConf = new SparkConf().setAppName("Hive Test").setMaster("local[*]")
            .set("spark.sql.warehouse.dir", warehouseLocation);

    JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    SQLContext sc = new HiveContext(ctx.sc());

    System.out.println(" Current Tables: ");

    DataFrame results = sc.sql("show tables");
    results.show();
}
}

However, I am getting Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.sql(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame; I am creating a flat jar and running the jar from command line

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/PortalHandlerTest.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/SparkTest.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [file:/home/cloudera/workspace/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/JARs/slf4j-log4j12-1.7.22.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/04/25 08:44:07 INFO SparkContext: Running Spark version 2.1.0
17/04/25 08:44:07 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
17/04/25 08:44:07 WARN SparkContext: Support for Scala 2.10 is deprecated as of Spark 2.1.0
17/04/25 08:44:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/25 08:44:08 INFO SecurityManager: Changing view acls to: cloudera
17/04/25 08:44:08 INFO SecurityManager: Changing modify acls to: cloudera
17/04/25 08:44:08 INFO SecurityManager: Changing view acls groups to: 
17/04/25 08:44:08 INFO SecurityManager: Changing modify acls groups to: 
17/04/25 08:44:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(cloudera); groups with view permissions: Set(); users  with modify permissions: Set(cloudera); groups with modify permissions: Set()
17/04/25 08:44:09 INFO Utils: Successfully started service 'sparkDriver' on port 43850.
17/04/25 08:44:09 INFO SparkEnv: Registering MapOutputTracker
17/04/25 08:44:09 INFO SparkEnv: Registering BlockManagerMaster
17/04/25 08:44:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/04/25 08:44:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/04/25 08:44:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-4199c353-4e21-4863-8b78-cfa280ce2de3
17/04/25 08:44:09 INFO MemoryStore: MemoryStore started with capacity 375.7 MB
17/04/25 08:44:09 INFO SparkEnv: Registering OutputCommitCoordinator
17/04/25 08:44:09 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/04/25 08:44:09 INFO Utils: Successfully started service 'SparkUI' on port 4041.
17/04/25 08:44:09 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4041
17/04/25 08:44:10 INFO Executor: Starting executor ID driver on host localhost
17/04/25 08:44:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41716.
17/04/25 08:44:10 INFO NettyBlockTransferService: Server created on 10.0.2.15:41716
17/04/25 08:44:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/04/25 08:44:10 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 41716, None)
17/04/25 08:44:10 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:41716 with 375.7 MB RAM, BlockManagerId(driver, 10.0.2.15, 41716, None)
17/04/25 08:44:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 41716, None)
17/04/25 08:44:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 41716, None)
Current Tables: 
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.sql(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
at spark_test.MyTest.main(MyTest.java:31)
17/04/25 08:44:10 INFO SparkContext: Invoking stop() from shutdown hook
17/04/25 08:44:10 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4041
17/04/25 08:44:10 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/04/25 08:44:10 INFO MemoryStore: MemoryStore cleared
17/04/25 08:44:10 INFO BlockManager: BlockManager stopped
17/04/25 08:44:10 INFO BlockManagerMaster: BlockManagerMaster stopped
17/04/25 08:44:10 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/04/25 08:44:10 INFO SparkContext: Successfully stopped SparkContext
17/04/25 08:44:10 INFO ShutdownHookManager: Shutdown hook called
17/04/25 08:44:10 INFO ShutdownHookManager: Deleting directory /tmp/spark-93fca3d1-ff79-4d2b-b07f-a340c1a60416

Probably because my pom has spark version 1.6.0, but cloudera VM is running 2.1.0. spark-shell is running spark version 1.6.0 and working fine. How do I force the version to be 1.6.0 in my java program?

Any help would be appreciated.

2 Answers 2

2

DataFrame() was supplanted in Spark 2 by Dataset(). You'll need to import org.apache.spark.sql.Dataset and use that if you're running a Spark 1.6 client with a Spark 2.1 server-side. Further information here. Most of the API is similar from a developer experience perspective. Honestly though you'd be a lot better off using at least Spark 2.0 dependencies in your client, if not the server version.

Sign up to request clarification or add additional context in comments.

4 Comments

I compiled the same program under Spark 2.1.0 to use Datasets, but Can not read any Hive table through the program, even if I set warehouse.dir and copied hive-site.xml in /usr/lib/spark/conf/ . Please look at the post here
That's a different problem. I'll address it in your other post.
If your original question was resolved via the steps noted, please acknowledge.
Dataset is not available in Spark 1.6.0. So once compiled with Spark 2.1.0, Using Dataset did resolve this issue.
0

Your logs show that you are running with Spark 2.1 libraries targeted against a 1.6.0 Spark Cluster. My guess is that your client and server libraries are not binary compatible. I would suggest that you use the same versions in your application as exists in the server to insure compatibility.

1 Comment

That is what my guess is. However, I am not sure how I can force my java program to run Spark 1.6.0 and not Spark 2.1.0. Also, I compiled the same program under Spark 2.1.0, but Can not read any Hive table through the program, even if I set warehouse.dir and copied hive-site.xml in /usr/lib/spark/conf/ . Please look at the post here

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.