28

I'm new to Spark SQL and am trying to convert a string to a timestamp in a spark data frame. I have a string that looks like '2017-08-01T02:26:59.000Z' in a column called time_string

My code to convert this string to timestamp is

CAST (time_string AS Timestamp)

But this gives me a timestamp of 2017-07-31 19:26:59

Why is it changing the time? Is there a way to do this without changing the time?

Thanks for any help!

3
  • try unix_timestamp to cast it to TimestampType :) Commented Aug 8, 2017 at 8:07
  • See also stackoverflow.com/questions/35761586/… . Commented Oct 19, 2017 at 7:31
  • In Spark 2.3.2 (emr-5.19.0), I can confirm the CAST(time_string AS timestamp) worked without a glitch (didn't change the time) Commented Feb 12, 2019 at 7:13

3 Answers 3

29

You could use unix_timestamp function to convert the utc formatted date to timestamp

val df2 = Seq(("a3fac", "2017-08-01T02:26:59.000Z")).toDF("id", "eventTime")

df2.withColumn("eventTime1", unix_timestamp($"eventTime", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast(TimestampType))

Output:

+-------------+---------------------+
|userid       |eventTime            |
+-------------+---------------------+
|a3fac        |2017-08-01 02:26:59.0|
+-------------+---------------------+

Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Suppose we have 9 digits past the decimal, do we really need to specify ss.SSSSSSSSS'Z'?
Your format string could read "yyyy-MM-dd'T'HH:mm:ss.SSSX" to parse any timezone, not just UTC
9

Solution on Java

There are some Spark SQL functions which let you to play with the date format.

Conversion example : 20181224091530 -> 2018-12-24 09:15:30

Solution (Spark SQL statement) :

SELECT
 ...
 to_timestamp(cast(DECIMAL_DATE as string),'yyyyMMddHHmmss') as `TIME STAMP DATE`,
 ...
FROM some_table

You can use the SQL statements by using an instance of org.apache.spark.sql.SparkSession. For example if you want to execute an sql statement, Spark provide the following solution:

...
// You have to create an instance of SparkSession
sparkSession.sql(sqlStatement); 
...

Notes:

  • You have to convert the decimal to string and after you can achieve the parsing to timestamp format
  • You can play with the format the get however format you want...

1 Comment

My case, used this to_timestamp(cast(timestamp as string),'yyyyMMddHHmmss') output coming as null only, where as my timestamp column value is 2019/08/02 15:00:00. let me know if I am missing anything else.
4
  1. In spark sql you can use to_timestamp and then format it as your requirement. select date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as from

  2. Here 'timestamp' with value is 2019/02/23 12:00:00 and it is StringType column in 'event' table. To convert into TimestampType apply to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss). It is need to make sure the format for timestamp is same as your column value. Then you apply date_format to convert it as per your requirement.

> select date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as timeStamp from event

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.