3

In Spark I am trying to execute SQL queries on a temporary table derived from a data frame that I manually built by reading a csv file and converting the columns into the right data type.

Specifically, the table I'm talking about is the LINEITEM table from [TPC-H specification][1]. Unlike stated in the specification I am using TIMESTAMP rather than DATE because I've read that Spark does not support the DATE type.

In my single scala source file, after creating the data frame and registering a temporary table called "lineitem", I am trying to execute the following query:

val res = sqlContext.sql("SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01 00:00:00');")

When I submit the packaged jar using spark-submit, I get the following error:

Exception in thread "main" java.lang.RuntimeException: [1.75] failure: ``union'' expected but but `;' found

When I omit the semicolon and do the same thing, I get the following error:

Exception in thread "main" java.util.NoSuchElementException: key not found: date

Spark version is 1.4.0.

Does anyone have an idea what's the problem with these queries?

[1] http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpch2.17.1.pdf

1

2 Answers 2

7
  1. SQL queries passed to SQLContext.sql shouldn't be delimited using semicolon - this the source of your first problem
  2. DATE UDF expects date in the YYYY-­MM-­DD form and DATE('1998-12-01 00:00:00') evaluates to null. As long as timestamp can be casted to DATE correct query string looks like this:

    "SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01')"
    
  3. DATE is a Hive UDF. It means you have to use HiveContext not a standard SQLContext - this is the source of your second problem.

    import org.apache.spark.sql.hive.HiveContext
    
    val sqlContext = new HiveContext(sc) // where sc is a SparkContext
    
  4. In Spark >= 1.5 it is also possible to use to_date function:

    import org.apache.spark.sql.functions.{lit, to_date}
    
    df.where(to_date($"shipdate") <= to_date(lit("1998-12-01")))
    
Sign up to request clarification or add additional context in comments.

Comments

2

Please try hive function CAST (expression AS toDatatype) It changes an expression from one datatype to other
e.g. CAST ('2016-06-17 00.00.000' AS DATE) will convert String to Date
In your case
val res = sqlContext.sql("SELECT * FROM lineitem l WHERE CAST(l.shipdate as DATE) <= CAST('1998-12-01 00:00:00' AS DATE);")

Supported datatype conversions are as listed in Hive Casting Dates

2 Comments

A little bit of explanation would not hurt, see Explaining entirely code-based answers
@J.J.Hakala Thanks for information. I am still learning about how to work on stackoverflow. Will keep in mind about this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.