15

I try to execute a simple Scala script using Spark as described in the Spark Quick Start Tutorial. I have not troubles to execute the following Python code:

"""SimpleApp.py"""
from pyspark import SparkContext

logFile = "tmp.txt"  # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()

print "Lines with a: %i, lines with b: %i" % (numAs, numBs)

I execute this code using the following command:

/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.py

However, if I try to do the same using Scala, I have technical problems. In more detail, the code that I try to execute is:

* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "tmp.txt" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

I try to execute it in the following way:

/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.scala

As the result I get the following error message:

Error: Cannot load main class from JAR file

Does anybody know what I am doing wrong?

3
  • Navigate in the terminal (command line) into a folder where your scala file is located. Then run "scalac YouClassName.scala". Finally execute it by running a command: "scala YourClassName". BTW you need to install Scala before these steps :) Commented Jun 3, 2017 at 17:46
  • from JAR file is right in the message, so that's what you're doing wrong Commented Jun 3, 2017 at 17:48
  • @AlexFruzenshtein, if I execute scalac hellow_world.scala I get error messages error: object apache is not a member of package org Commented Jun 3, 2017 at 18:26

2 Answers 2

12

I want to add to @JacekLaskowski's an alternative solution I use sometimes for POC or tests purposes.

It would be to use the script.scala from inside the spark-shell with :load.

:load /path/to/script.scala

You won't need to define a SparkContext/SparkSession as the script will use the variables defined in the scope of the REPL.

You also don't need to wrap the code in a Scala object.

PS: I consider this more as a hack and not to use for production purposes.

Sign up to request clarification or add additional context in comments.

5 Comments

Also spark-shell can take a "-i file.scala" argument where the file.scala contents are evaluated in the REPL as if typed or :loaded then the REPL exits.
@Garren and it's not the only way to do it but I don't want to give more "hacks" like this. (It's bad habits)
While I agree this style should absolutely not be used for production applications, this is a fantastic way to quickly prototype, so omitting a valid strategy on the basis of promoting "hacks" assuming they'll form bad habits is disingenuous.
@Garren You're probably right about my assumptions but I totally agree with it being fantastic ways to quickly prototype and that's why I use them. I'll update my answer later.
If script is wrapped into Scala object main method, then one can execute it by [object name].main()
5

Use spark-submit --help to know the options and arguments.

$ ./bin/spark-submit --help
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]

As you can see in the first Usage spark-submit requires <app jar | python file>.

The app jar argument is a Spark application's jar with the main object (SimpleApp in your case).

You can build the app jar using sbt or maven that you can read in the official documentation's Self-Contained Applications:

Suppose we wish to write a self-contained application using the Spark API. We will walk through a simple application in Scala (with sbt), Java (with Maven), and Python.

and later in the section:

we can create a JAR package containing the application’s code, then use the spark-submit script to run our program.


p.s. Use Spark 2.1.1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.