0

I am trying to use spark SQL to query a table where user-filled dates didn't follow any kind of strict formatting, so you see entries such as "10/4/21" and "10/04/21", for example. My query attempt was

select CAST(unix_timestamp(thedate,"mm/d/yy") AS TIMESTAMP) from table

and this works for dates like "10/4/21", but not for "10/04/21" (and vice versa, the "mm/dd/yy" works for "10/04/21" but not for "10/4/21"). Is there any way to format the query so that it works for either date format?

1 Answer 1

1

You can try converting thedate using both formats and then apply a COALESCE such that the result from converting using valid format is returned.


sql = """
SELECT thedate,
       Cast(COALESCE(unix_timestamp(thedate, "MM/d/yy"),
            unix_timestamp(thedate, "MM/dd/yy")
            ) AS TIMESTAMP) AS thedate_astimestamp 
FROM   (SELECT Explode(Array('10/4/21', '10/04/21', '10/11/21')) AS thedate)  
"""
spark.sql(sql).show()

Output

+--------+-------------------+
| thedate|thedate_astimestamp|
+--------+-------------------+
| 10/4/21|2021-10-04 00:00:00|
|10/04/21|2021-10-04 00:00:00|
|10/11/21|2021-10-11 00:00:00|
+--------+-------------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.