Can unix_timestamp() return unix time in milliseconds in Apache Spark?

Question

I'm trying to get the unix time from a timestamp field in milliseconds (13 digits) but currently it returns in seconds (10 digits).

scala> var df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.123", "2017-01-18 11:00:00.882", "2017-01-18 11:00:02.432").toDF()
df: org.apache.spark.sql.DataFrame = [value: string]

scala> df = df.selectExpr("value timeString", "cast(value as timestamp) time")
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp]


scala> df = df.withColumn("unix_time", unix_timestamp(df("time")))
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp ... 1 more field]

scala> df.take(4)
res63: Array[org.apache.spark.sql.Row] = Array(
[2017-01-18 11:00:00.000,2017-01-18 11:00:00.0,1484758800], 
[2017-01-18 11:00:00.123,2017-01-18 11:00:00.123,1484758800], 
[2017-01-18 11:00:00.882,2017-01-18 11:00:00.882,1484758800], 
[2017-01-18 11:00:02.432,2017-01-18 11:00:02.432,1484758802])

Even though 2017-01-18 11:00:00.123 and 2017-01-18 11:00:00.000 are different, I get the same unix time back 1484758800

What am I missing?

Тимур Залимов · Accepted Answer · 2019-05-13 11:23:27Z

10

Milliseconds hide in fraction part timestamp format

Try this:

df = df.withColumn("time_in_milliseconds", col("time").cast("double"))

You'll get something like 1484758800.792, where 792 it's milliseconds

At least it's works for me (Scala, Spark, Hive)

answered May 13, 2019 at 11:23

Тимур Залимов

1311 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Brad Hein Over a year ago

This is also of particular usefulness if you need to round a time field to the nearest second. unix_timestamp() does not properly round the timestamp, instead it takes the mathematical floor of the time down to the seconds, ignoring the milliseconds fraction. The solution in this answer provides a decimal unix time which can be rounded, thus providing a way to round a timestamp to the nearest second.

Sangram Gaikwad · Accepted Answer · 2019-01-23 11:03:35Z

Implementing the approach suggested in Dao Thi's answer

import pyspark.sql.functions as F
df = spark.createDataFrame([('22-Jul-2018 04:21:18.792 UTC', ),('23-Jul-2018 04:21:25.888 UTC',)], ['TIME'])
df.show(2,False)
df.printSchema()

Output:

+----------------------------+
|TIME                        |
+----------------------------+
|22-Jul-2018 04:21:18.792 UTC|
|23-Jul-2018 04:21:25.888 UTC|
+----------------------------+
root
|-- TIME: string (nullable = true)

Converting string time-format (including milliseconds ) to unix_timestamp(double). Extracting milliseconds from string using substring method (start_position = -7, length_of_substring=3) and Adding milliseconds seperately to unix_timestamp. (Cast to substring to float for adding)

df1 = df.withColumn("unix_timestamp",F.unix_timestamp(df.TIME,'dd-MMM-yyyy HH:mm:ss.SSS z') + F.substring(df.TIME,-7,3).cast('float')/1000)

Converting unix_timestamp(double) to timestamp datatype in Spark.

df2 = df1.withColumn("TimestampType",F.to_timestamp(df1["unix_timestamp"]))
df2.show(n=2,truncate=False)

This will give you following output

+----------------------------+----------------+-----------------------+
|TIME                        |unix_timestamp  |TimestampType          |
+----------------------------+----------------+-----------------------+
|22-Jul-2018 04:21:18.792 UTC|1.532233278792E9|2018-07-22 04:21:18.792|
|23-Jul-2018 04:21:25.888 UTC|1.532319685888E9|2018-07-23 04:21:25.888|
+----------------------------+----------------+-----------------------+

Checking the Schema:

df2.printSchema()


root
 |-- TIME: string (nullable = true)
 |-- unix_timestamp: double (nullable = true)
 |-- TimestampType: timestamp (nullable = true)

Wim · Accepted Answer · 2021-03-10 10:56:17Z

7

It cannot be done with unix_timestamp() but since Spark 3.1.0 there is a built-in function called unix_millis():

unix_millis(timestamp) - Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.

answered Mar 10, 2021 at 10:56

Wim

1252 silver badges8 bronze badges

Comments

Đào Thị Hươu · Accepted Answer · 2023-09-15 09:25:09Z

4

unix_timestamp() returns unix timestamp in seconds.

The last 3 digits in the timestamps are the same as the last 3 digits of the milliseconds string (1.999sec = 1999 milliseconds), so just take the last 3 digits of the timestamps string and append them to the end of the milliseconds string.

edited Sep 15, 2023 at 9:25

answered Mar 2, 2017 at 9:19

Đào Thị Hươu

414 bronze badges

Comments

Michael Heil · Accepted Answer · 2021-01-12 14:50:02Z

Up to Spark version 3.0.1 it is not possible to convert a timestamp into unix time in milliseconds using the SQL built-in function unix_timestamp.

According to the code on Spark's DateTimeUtils

"Timestamps are exposed externally as java.sql.Timestamp and are stored internally as longs, which are capable of storing timestamps with microsecond precision."

Therefore, if you define a UDF that has a java.sql.Timestamp as input you can call getTime for a Long in millisecond. If you apply unix_timestamp you will only get unix time with precision in seconds.

val tsConversionToLongUdf = udf((ts: java.sql.Timestamp) => ts.getTime)

Applying this to a variety of Timestamps:

val df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.111", "2017-01-18 11:00:00.110", "2017-01-18 11:00:00.100")
  .toDF("timestampString")
  .withColumn("timestamp", to_timestamp(col("timestampString")))
  .withColumn("timestampConversionToLong", tsConversionToLongUdf(col("timestamp")))
  .withColumn("timestampUnixTimestamp", unix_timestamp(col("timestamp")))

df.printSchema()
df.show(false)

// returns
root
 |-- timestampString: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 |-- timestampConversionToLong: long (nullable = false)
 |-- timestampCastAsLong: long (nullable = true)

+-----------------------+-----------------------+-------------------------+-------------------+
|timestampString        |timestamp              |timestampConversionToLong|timestampUnixTimestamp|
+-----------------------+-----------------------+-------------------------+-------------------+
|2017-01-18 11:00:00.000|2017-01-18 11:00:00    |1484733600000            |1484733600         |
|2017-01-18 11:00:00.111|2017-01-18 11:00:00.111|1484733600111            |1484733600         |
|2017-01-18 11:00:00.110|2017-01-18 11:00:00.11 |1484733600110            |1484733600         |
|2017-01-18 11:00:00.100|2017-01-18 11:00:00.1  |1484733600100            |1484733600         |
+-----------------------+-----------------------+-------------------------+-------------------+

Neo li · Accepted Answer · 2022-11-17 03:23:51Z

0

Wow, same with @Тимур Залимов just cast it

>>> df2 = df_msg.withColumn("datetime", F.col("timestamp").cast("timestamp")).withColumn("timestamp_back" , F.col("datetime").cast("double"))
>>> r = df2.rdd.take(1)[0]
>>> r.timestamp_back                                                            
1666509660.071501
>>> r.timestamp
1666509660.071501
>>> r.datetime
datetime.datetime(2022, 10, 23, 15, 21, 0, 71501)

answered Nov 17, 2022 at 3:23

Neo li

3883 silver badges8 bronze badges

Comments

head_thrash · Accepted Answer · 2024-01-19 16:04:14Z

0

To get number of milliseconds since UTC epoch, one could do:

scala> spark.sql("select unix_millis(current_timestamp())").show(truncate=false)
+--------------------------------+
|unix_millis(current_timestamp())|
+--------------------------------+
|1705680115519                   |
+--------------------------------+

answered Jan 19, 2024 at 16:04

head_thrash

1,7111 gold badge21 silver badges27 bronze badges

Collectives™ on Stack Overflow

Can unix_timestamp() return unix time in milliseconds in Apache Spark?

7 Answers 7

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related