2

My requirement is to filter dataframe based on timestamp column such that data which are only 10 minutes old. Dataframe looks like:

ID,timestamp,value
ID-1,8/23/2017 6:11:13,4.56
ID-2,8/23/2017 6:5:21,5.92
ID-3,8/23/2017 5:49:13,6.00

I am trying the following code but not getting the desired output

df.filter(to_date(df("timestamp")).gt(lit(now.get(Calendar.MINUTE)-10)))

Any help will be appreciated.

1 Answer 1

4

Given the dataframe as

+----+------------------+-----+
|ID  |timestamp         |value|
+----+------------------+-----+
|ID-1|8/23/2017 14:48:13|4.56 |
|ID-2|8/23/2017 6:5:21  |5.92 |
|ID-3|8/23/2017 5:49:13 |6.0  |
+----+------------------+-----+ 

and current time as

2017-08-23 14:53:33

You can do the following

import org.apache.spark.sql.functions._
df.withColumn("timestamp", unix_timestamp($"timestamp", "MM/dd/yyyy HH:mm:ss"))
      .filter((unix_timestamp(current_timestamp()) - $"timestamp")/60 < 10)
    .select($"ID", $"timestamp".cast(TimestampType), $"value")

where unix_timestamp will convert the DateTimeStamp to long and current_timestamp() will give the current date and time

You should have final dataframe as

+----+---------------------+-----+
|ID  |timestamp            |value|
+----+---------------------+-----+
|ID-1|2017-08-23 14:48:13.0|4.56 |
+----+---------------------+-----+

I hope the answer is helpful

Sign up to request clarification or add additional context in comments.

1 Comment

If I want to filter data frame based on timestamp column such that data is either 30 days old or more. I am new to Scala. Please help me with this as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.