0

I am currently working on Apache spark. I want to see the time taken by the system to perform wordcount on a text file and store it in a file. I need to automate the commands with a bash script. I tried to run the following script :-

start-all.sh
    (time spark-shell 
     val inputfile = sc.textFile("/home/pi/Desktop/dataset/books_50.txt")
     val counts = inputfile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_);
     counts.toDebugString
     counts.cache()
     counts.saveAsTextFile("output")
     exit()
     ) 2> /home/pi/Desktop/spark_output/test.txt
stop-all.sh

For which it showed the following error:-

./wordcount_spark.sh: line 4: syntax error near unexpected token `('
./wordcount_spark.sh: line 4: ` val inputfile = sc.textFile("/home/pi/Desktop/dataset/books_50.txt")'

I tried to add EOF to the code and I got the following error:-

./wordcount_spark.sh: line 12: warning: here-document at line 3 delimited by end-of-file (wanted `EOF')
./wordcount_spark.sh: line 13: syntax error: unexpected end of file

I didn't understand how to pass scala commands through a bash script

1

1 Answer 1

0

Spark-shell is an interactive tool, meant to be used interactively by a user typing one command after the other, so it's ill-fitting for your needs.

You should take a look at the Self-Contained Applications section in Spark's Quick Start guide - which guides you how to write and build a simple Scala application, and execute it using spark-submit. That should better fit your requirement.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.