6

I'm new to Spark. I attempted to run a Spark app (.jar) on CDH 5.8.0-0 on Oracle VirtualBox 5.1.4r110228 which leveraged Spark Steaming to perform sentiment analysis on twitter. I have my twitter account created and all required (4) tokens were generated. I was blocked by the NoClassDefFoundError exception.

I've been googling around for a couple of days. The best advice I found so far was in the URL below but apparently my environment is still missing something.

http://javarevisited.blogspot.com/2011/06/noclassdeffounderror-exception-in.html#ixzz4Ia99dsp0

What does it mean by a library showed up in Compile by was missing at RunTime? How can we fix this?

What is the library of Logging? I came across an article stating this Logging is subject to be deprecated. Besides that, I do see log4j in my environment.

In my CDH 5.8, I'm running these versions of software: Spark-2.0.0-bin-hadoop2.7 / spark-core_2.10-2.0.0 jdk-8u101-linux-x64 / jre-bu101-linux-x64

I appended the detail of the exception at the end. Here is the procedure I performed to execute the app and some verification I did after hitting the exception:

  1. Unzip twitter-streaming.zip (the Spark app)
  2. cd twitter-streaming
  3. run ./sbt/sbt assembly
  4. Update env.sh with your Twitter account

$ cat env.sh

export SPARK_HOME=/home/cloudera/spark-2.0.0-bin-hadoop2.7
export CONSUMER_KEY=<my_consumer_key>
export CONSUMER_SECRET=<my_consumer_secret>
export ACCESS_TOKEN=<my_twitterapp_access_token>
export ACCESS_TOKEN_SECRET=<my_twitterapp_access_token>

The submit.sh script wrapped up the spark-submit command with required credential info in env.sh:

$ cat submit.sh

source ./env.sh
$SPARK_HOME/bin/spark-submit --class "TwitterStreamingApp" --master local[*] ./target/scala-2.10/twitter-streaming-assembly-1.0.jar $CONSUMER_KEY $CONSUMER_SECRET $ACCESS_TOKEN $ACCESS_TOKEN_SECRET

The log of the assembly process: [cloudera@quickstart twitter-streaming]$ ./sbt/sbt assembly

Launching sbt from sbt/sbt-launch-0.13.7.jar
[info] Loading project definition from /home/cloudera/workspace/twitter-streaming/project
[info] Set current project to twitter-streaming (in build file:/home/cloudera/workspace/twitter-streaming/)
[info] Including: twitter4j-stream-3.0.3.jar
[info] Including: twitter4j-core-3.0.3.jar
[info] Including: scala-library.jar
[info] Including: unused-1.0.0.jar
[info] Including: spark-streaming-twitter_2.10-1.4.1.jar
[info] Checking every *.class/*.jar file's SHA-1.
[info] Merging files...
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'first'
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.properties' with strategy 'first'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.xml' with strategy 'first'
[warn] Merging 'log4j.properties' with strategy 'discard'
[warn] Merging 'org/apache/spark/unused/UnusedStubClass.class' with strategy 'first'
[warn] Strategy 'discard' was applied to 2 files
[warn] Strategy 'first' was applied to 4 files
[info] SHA-1: 69146d6fdecc2a97e346d36fafc86c2819d5bd8f
[info] Packaging /home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.0.jar ...
[info] Done packaging.
[success] Total time: 6 s, completed Aug 27, 2016 11:58:03 AM

Not sure exactly what it means but everything looked good when I ran Hadoop NativeCheck:

$ hadoop checknative -a

16/08/27 13:27:22 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/08/27 13:27:22 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4:     true revision:10301
bzip2:   true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so

Here is the console log of my exception: $ ./submit.sh

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/28 20:13:23 INFO SparkContext: Running Spark version 2.0.0
16/08/28 20:13:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/28 20:13:24 WARN Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/08/28 20:13:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/08/28 20:13:24 INFO SecurityManager: Changing view acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing view acls groups to: 
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls groups to: 
16/08/28 20:13:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(cloudera); groups with view permissions: Set(); users  with modify permissions: Set(cloudera); groups with modify permissions: Set()
16/08/28 20:13:25 INFO Utils: Successfully started service 'sparkDriver' on port 37550.
16/08/28 20:13:25 INFO SparkEnv: Registering MapOutputTracker
16/08/28 20:13:25 INFO SparkEnv: Registering BlockManagerMaster
16/08/28 20:13:25 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-37a0492e-67e3-4ad5-ac38-40448c25d523
16/08/28 20:13:25 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/08/28 20:13:25 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/28 20:13:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/28 20:13:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
16/08/28 20:13:25 INFO SparkContext: Added JAR file:/home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.1.jar at spark://10.0.2.15:37550/jars/twitter-streaming-assembly-1.1.jar with timestamp 1472440405882
16/08/28 20:13:26 INFO Executor: Starting executor ID driver on host localhost
16/08/28 20:13:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41264.
16/08/28 20:13:26 INFO NettyBlockTransferService: Server created on 10.0.2.15:41264
16/08/28 20:13:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:41264 with 366.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)I
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.streaming.twitter.TwitterUtils$.createStream(TwitterUtils.scala:44)
    at TwitterStreamingApp$.main(TwitterStreamingApp.scala:42)
    at TwitterStreamingApp.main(TwitterStreamingApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 23 more
16/08/28 20:13:26 INFO SparkContext: Invoking stop() from shutdown hook
16/08/28 20:13:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
16/08/28 20:13:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/28 20:13:26 INFO MemoryStore: MemoryStore cleared
16/08/28 20:13:26 INFO BlockManager: BlockManager stopped
16/08/28 20:13:26 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/28 20:13:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/28 20:13:26 INFO SparkContext: Successfully stopped SparkContext
16/08/28 20:13:26 INFO ShutdownHookManager: Shutdown hook called
16/08/28 20:13:26 INFO ShutdownHookManager: Deleting directory /tmp/spark-5e29c3b2-74c2-4d89-970f-5be89d176b26

I understand my post was lengthy. Your advice or insights are highly appreciated!!

-jsung8

6
  • 1
    The error says that at runtime it could not find Logging class to load in package org.apache.spark. This class is simply missing from twitter-streaming-assembly-1.0.jar. Make sure when you build it goes into the jar. Is there any other jar created while you build this example. Commented Aug 29, 2016 at 19:28
  • 2
    This may help you: stackoverflow.com/a/39194820/2873538 Commented Aug 29, 2016 at 19:37
  • 1
    @amit_kumar & Ajeets This indeed looks promising! Thank you so much for the direction and the link!! Although the advice on the "287358" post is quite clear a challenge for me is to identify the right pom.xml file to edit and also to download/configure Spark 1.6.2. I found half of a dozen of pom.xml files related to twitter-stream dir names...I'll play around and share what I found. Commented Aug 30, 2016 at 5:48
  • @jsung8 the answer 2873538 is recently written by me, and you should note that org.apache.spark.Logging is present in Spark 1.5.2 not the 1.6.2 which you seems to be going with. Commented Aug 30, 2016 at 5:55
  • @jsung8, can you post your pom.xml or .sbt file Commented Aug 30, 2016 at 6:06

5 Answers 5

1

Use: spark-core_2.11-1.5.2.jar

I had the same problem described by @jsung8 and tried to find the .jar suggested by @youngstephen but could not. However linking in spark-core_2.11-1.5.2.jar instead of spark-core_2.11-1.5.2.logging.jar resolved the exception in the way @youngstephen suggested.

Sign up to request clarification or add additional context in comments.

1 Comment

core_2.11-1.5.2.logging.jar is needed during spark-submit command.
0

org/apache/spark/Logging was removed after spark 1.5.2.

Since your spark-core version is 2.0, then the simplest solution is:

download a single spark-core_2.11-1.5.2.logging.jar and put it in the jars directory under your spark root directory.

Anyway, It solves my problem, hope it helps.

2 Comments

where do you find it ? Would you have a link ?
@romainjouin : sorry, no link out there. I just discussed it with my partner.
0

One reason that may cause this problem is lib and class conflict. I faced this problem and solved it using some maven exclusions:

   <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_2.11</artifactId>
       <version>2.0.0</version>
       <scope>provided</scope>
       <exclusions>
           <exclusion>
               <groupId>log4j</groupId>
               <artifactId>log4j</artifactId>
           </exclusion>
       </exclusions>
   </dependency>

   <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-streaming_2.11</artifactId>
       <version>2.0.0</version>
       <scope>provided</scope>
   </dependency>

   <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
       <version>2.0.0</version>
       <exclusions>
           <exclusion>
               <groupId>org.slf4j</groupId>
               <artifactId>slf4j-log4j12</artifactId>
           </exclusion>
           <exclusion>
               <groupId>log4j</groupId>
               <artifactId>log4j</artifactId>
           </exclusion>
       </exclusions>
   </dependency>

Comments

0

You're using an old version of the Spark Twitter connector. This class from your stack trace hints at that:

org.apache.spark.streaming.twitter.TwitterUtils

Spark removed that integration in version 2.0. You're using the one from an old Spark version that references the old Logging class which moved to a different package in Spark 2.0.

If you want to use Spark 2.0, you'll need to use the Twitter connector from the Bahir project.

Comments

0

Spark core version should be degraded to 1.5 due to the below error

java.lang.NoClassDefFoundError: org/apache/spark/Logging

http://bahir.apache.org/docs/spark/2.0.0/spark-streaming-twitter/ provides the better solution for this. By adding the below dependency, my issue was resolved.

 <dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>2.0.0</version>
</dependency>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.