Difference between calling sql() and using Spark API call()

Question

I am new to Spark/Scala/Hive. Am just wondering if there are any differences between calling

spark = new SparkSession(...).getHiveContext()
spark.sql("SELECR * FROM table")

and

spark = new SparkSession(...).getHiveContext() // not using
spark.read.table(table).select(from("*"))

??

Particularly, are there any performance difference.

OlivierBlanvillain · Accepted Answer · 2017-09-21 06:36:12Z

1

These two snippets have the same run-time performance.

The second API is safer, is you make a typo or try to used some non supported operation it will give you a quick and clear compilation error. It's funny that you wrote SELECR and not SELECT, that a good illustration of this point :)

answered Sep 21, 2017 at 6:36

OlivierBlanvillain

7,7784 gold badges34 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3426014 Over a year ago

Thanks for the reply! Do they still share the same run-time performance if it's for other kind of queries? Say sorting, grouping? Also, I saw this article by hortonworks: community.hortonworks.com/articles/42027/…, but was wondering if I enable hivecontext, does it make any difference, since it will be using HiveQL?

Collectives™ on Stack Overflow

Difference between calling sql() and using Spark API call()

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related