0

I am new to Spark/Scala/Hive. Am just wondering if there are any differences between calling

spark = new SparkSession(...).getHiveContext()
spark.sql("SELECR * FROM table")

and

spark = new SparkSession(...).getHiveContext() // not using
spark.read.table(table).select(from("*"))

??

Particularly, are there any performance difference.

1 Answer 1

1

These two snippets have the same run-time performance.

The second API is safer, is you make a typo or try to used some non supported operation it will give you a quick and clear compilation error. It's funny that you wrote SELECR and not SELECT, that a good illustration of this point :)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply! Do they still share the same run-time performance if it's for other kind of queries? Say sorting, grouping? Also, I saw this article by hortonworks: community.hortonworks.com/articles/42027/…, but was wondering if I enable hivecontext, does it make any difference, since it will be using HiveQL?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.