22

I have a spark job written in scala. I use

spark-shell -i <file-name>

to run the job. I need to pass a command-line argument to the job. Right now, I invoke the script through a linux task, where I do

export INPUT_DATE=2015/04/27 

and use the environment variable option to access the value using:

System.getenv("INPUT_DATE")

Is there a better way to handle the command line arguments in Spark-shell?

3
  • 1
    why would you want to pass an argument in spark-shell?!? why don't you use the spark-submit script to run the job normally?? Commented Apr 29, 2015 at 13:20
  • Still running 0.9.1in CDH 4.6. spark-submit not available yet. Commented May 1, 2015 at 3:55
  • 4
    Another reason why you'd want to do that is to avoid the hassle of building a project if you are only running a 2-line scala code. See my answer below as to how I solved this. Commented Jun 19, 2015 at 23:39

3 Answers 3

45

My solution is use a customized key to define arguments instead of spark.driver.extraJavaOptions, in case someday you pass in a value that may interfere JVM's behavior.

spark-shell -i your_script.scala --conf spark.driver.args="arg1 arg2 arg3"

You can access the arguments from within your scala code like this:

val args = sc.getConf.get("spark.driver.args").split("\\s+")
args: Array[String] = Array(arg1, arg2, arg3)
Sign up to request clarification or add additional context in comments.

2 Comments

nice. slightly cleaner than spark.driver.extraJavaOptions.
Actually you can do something easier with -conf spark.driver.arg1 -conf spark.driver.arg2 it seems that all configs prefixed with spark.driver are passed to the driver
24

Short answer:

spark-shell -i <(echo val theDate = $INPUT_DATE ; cat <file-name>)

Long answer:

This solution causes the following line to be added at the beginning of the file before passed to spark-submit:

val theDate = ...,

thereby defining a new variable. The way this is done (the <( ... ) syntax) is called process substitution. It is available in Bash. See this question for more on this, and for alternatives (e.g. mkFifo) for non-Bash environments.

Making this more systematic:

Put the code below in a script (e.g. spark-script.sh), and then you can simply use:

./spark-script.sh your_file.scala first_arg second_arg third_arg, and have an Array[String] called args with your arguments.

The file spark-script.sh:

scala_file=$1

shift 1

arguments=$@

#set +o posix  # to enable process substitution when not running on bash 

spark-shell  --master yarn --deploy-mode client \
         --queue default \
        --driver-memory 2G --executor-memory 4G \
        --num-executors 10 \
        -i <(echo 'val args = "'$arguments'".split("\\s+")' ; cat $scala_file)

1 Comment

is there a better way to do it? can I pass the arguments like: spark-shell -i script.scala args1 args2; so that in the scala file, i can retrieve the argument like args(1), args(2). this is a scala solution from alvinalexander.com/scala/…. however, it doesn't work in spark-shell. do you have any suggestions?
10

I use the extraJavaOptions when I have a scala script which is too simple to go through the build process but I still need to pass arguments to it. It's not beautiful, but it works and you can quickly pass multiple arguments:

spark-shell -i your_script.scala --conf spark.driver.extraJavaOptions="-Darg1,arg2,arg3"

Note that -D does not belong to the arguments, which are arg1, arg2, and arg3. You can then access the arguments from within your scala code like this:

val sconf = new SparkConf()

// load string
val paramsString = sconf.get("spark.driver.extraJavaOptions")

// cut off `-D`
val paramsSlice = paramsString.slice(2,paramsString.length)

// split the string with `,` as delimiter and save the result to an array
val paramsArray = paramsSlice.split(",")

// access parameters
val arg1 = paramsArray(0)

1 Comment

Include import statement: import org.apache.spark.SparkConf

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.