1

I'm new using servers In my pc I don't have any problem using Apache Spark. Normally I use IntelliJ for running the code.

I tried in the external server ssh to run a project and I have the error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$assertOnDriver(SparkSession.scala:1086) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:902) at com.p53.main(p53.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMainV2.main(AppMainV2.java:131) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 8 more

When I run in the terminal (/usr/local/spark/bin/spark-shell), Spark runs well.

My pom dependencies are:

dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_2.11</artifactId>
            <version>2.4.3</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-catalyst_2.11</artifactId>
            <version>2.4.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_2.11</artifactId>
            <version>2.4.3</version>
        </dependency>
    </dependencies>

The pom plugins:

plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-jar-plugin</artifactId>
                <version>3.0.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <archive>
                        <manifest>
                            <mainClass>Main</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>

I know I'm doing something wrong or something is missing, but I just dont figure out what is the problem.

2
  • Spark is built against Hadoop 2.7, not 3.2, and it's transitive, so you don't need to mention it in the POM Commented Oct 11, 2019 at 3:08
  • I remove it and still doesn't work, nether Hadoop 2.7 @cricket_007 Commented Oct 14, 2019 at 9:37

1 Answer 1

1

You need to set the SPARK_DIST_CLASSPATH.

export SPARK_DIST_CLASSPATH=`hadoop classpath`
Sign up to request clarification or add additional context in comments.

4 Comments

the classpath will be to download directly hadoop or to the Hadoop Jars in spark? Thx
@KarenciaGárate - hadoop jars in spark.
I add this amd still have the same problem. My bash is: export SPARK_HOME/home/kgarate/spark export HADOOP_HOME=/home/kgarate/hadoop export SPARK_DIST_CLASSPATH=/home/kgarate/spark/jars/*
But I also tried with my own hadoop classpath @Jerin

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.