Running spark application using spark submit

Question

I am newbie in Spark. I have an application which run each spark sql queries by invoking spark-shell. So it will generate a set of query like below and invoke spark-shell command to process these queries one by one.

val Query=spark.sql(""" SELECT userid AS userid, rating AS rating, movieid AS movieid FROM default.movieTable""");

Now I want to run this application using spark-submit instead of spark-shell. Can anybody tell how to do that?

N_C · Accepted Answer · 2018-10-16 13:30:35Z

3

If you are using scala, spark-submit takes in a jar file, you will have to create a scala project with sbt as the dependency/build tool, sbt can take all your code and bundle it into a jar file. You can follow this guide. - Similar approaches exist for python and java

Update1: spark-shell is intended to be used for conducting quick experiments, when spark-shell is invoked, it comes with SparkSession instantiated automatically, so when you want to achieve this programatically, you would need to invoke this programatically.

For ex:

val sparkSession: SparkSession = 
SparkSession.builder.appName("awesomeApp").getOrCreate()

// This import is needed to use the $-notation, and imported automatically in `spark-shell` by default
import sparkSession.implicits._

...
//code to generate/import/build your `movieTable` view/table
...

val queryOutputDf=sparkSession.sql(""" SELECT userid AS userid, rating AS rating, movieid AS movieid FROM default.movieTable""");

//the above output is a `dataframe`, it needs to be written to a file
queryOutputDf.rdd.map(_.toString()).saveAsTextFile("/path/to/a/file/with/good/name")

This would achieve your intention for a single query, you would have to loop through your queries and pass it to the above.

edited Oct 16, 2018 at 13:30

answered Oct 15, 2018 at 17:48

N_C

9921 gold badge8 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

far2c Over a year ago

Thank you for the response. So currently my application will take each query and pass it to spark-shell.Just like we are typing queries interactively. When I am submitting to spark-submit, My current application won't be able to run each queries. I wanted to bundle these queries and invoke the spark-submit. How can I do that? How these queries should be organised? Any idea on that? Correct me if I am going in the wrong direction.

far2c Over a year ago

Thank you. So, in my case I have to dynamically create the class and add the queries to it. The queries are stored in the map. I have to get each query and add it to class. once the class is created I am going to trigger spark-submit using the java processbuilder. Is there any other better idea?

N_C Over a year ago

@far2c You could drive everything by a property file, it totally depends numerous parameters, like volume of data, parallelism, scheduling etc, you can lookup more about spark scheduling

Collectives™ on Stack Overflow

Running spark application using spark submit

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related