3

during the learning of Spark 2 in Scala, I found that we can use two ways to query data in SparkSQL:

  1. spark.sql(SQL_STATEMENT) // variable "spark" is an instance of SparkSession
  2. DataSet/DataFrame.select/.where/.groupBy....

My question is what are the differences(functional, performance, etc.) bewtween the them? I tried to find the anwser on internet or their documentation, but failed, so I would like to listen to your opinions

2 Answers 2

2

I think both the query with SQL query and without SQL query are equivalent and equal. Both of same are in internals and use same engines inside. But I would prefer to user without SQL queries which are easier to write and provide some level of type safety.

among these

  1.  spark.sql(SQL_STATEMENT) // variable "spark" is a SparkSession
  2.  DataSet/DataFrame.select/.where/.groupBy....

I would choose number 2 for most of the case since it provides some lavel of typesafe

Sign up to request clarification or add additional context in comments.

6 Comments

that's not true, for example you cannot use subqueries in the DataFrame API.
Yeah thats true we cannot use subqueries in DataFrame Api forgot to mention. Thankyou
@ShankarKoirala Thank you for your opinion first, but i am wondering what is the reference of your opinion? Why di you know they use same engines inside ?
@ShankarKoirala Thank you, haha, but it seems it's a dead loop
|
1

By using DataFrames which is a Java API one can debug the SQL statements by breaking them down into simple statements. This would help in better understanding.

The only thing that makes difference is what kind of underlying algorithm is used for grouping. HashAggregation vs SortAggregation HashAggregation would be more efficient than SortAggregation. SortAggregation - Will sort the rows and then gather together the matching rows. O(n*log n) HashAggregation creates a HashMap using key as grouping columns where as rest of the columns as values in a Map. Spark SQL uses HashAggregation where possible(If data for value is mutable). O(n)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.