0

How to validate the incoming date that is.. in yyyyMM format and compare it with the current time stamp(In yyyyMMDD format) up to current month. if the incoming date (i.e) month exceeds the current month reject else populate the date field. Use from_unix_time(unix_timestamp) for validating in spark sql.

needed in the case when statement(spark-sql)

for eg:

EDITED:

spark.sql("select case when (length(date)>0 and from_unixtime(unix_timestamp(date,'yyyyMMdd'),'yyyyMMdd') == date and substring(from_unixtime(unix_timestamp(date,'yyyyMMdd'),'yyyyMMdd'),5,6) == month(current_date())) then substring(date,1,8)else null end as date, case when (length(date)>0 and from_unixtime(unix_timestamp(date,'yyyyMMdd'),'yyyyMMdd') == date and substring(from_unixtime(unix_timestamp(date,'yyyyMMdd'),'yyyyMMdd'),5,6) == month(current_date())) then 'Y' else 'DOB: should be present in YYYYMMDD format'end as date_flag from input").show(false)

In the above edited query it returns null when comparing the incoming month with the current month....it should return 'Y' ....

INPUT: 20210801 EXPECTED OUTPUT: 20210801 Y INPUT: 20210301 EXPECTED OUTPUT: NULL Y INPUT: 03091998 EXPECTED OUTPUT: NULL DOB: should be present in YYYYMMDD format

NOTE: comparison is based on month! if it is current month then print date else reject..

2 Answers 2

0

You can convert the current date to a string:

where col > from_unixtime(unix_timestamp(), 'yyyyMM')
Sign up to request clarification or add additional context in comments.

9 Comments

Can you please provide this in case when statement comparing to the current timestamp.
@Vicky . . . unix_timestamp() returns the current timestamp with no arguments.
I am wondering, which values will be in result, if execution will be started on Aug, 31 and will continue on Sep, 1.
@pasha701 . . . I would guess that the date used would be the date when the query began (that is how many databases work). However, you might have to dive into the documentation to check.
In documentation: "All calls of unix_timestamp within the same query return the same value"; thanks.
|
0

For filtering dates which exceeds current month, next month string can be created before filtering, and dataframe values can be compared to next month string:

val df = Seq("20210603", "20210812", "20210901").toDF("date")
    // 202109
val currentMonthString = LocalDate.now().plusMonths(1).format(DateTimeFormatter.ofPattern("yyyyMM"))

df
  .where($"date" <= currentMonthString)

Output:

+--------+
|date    |
+--------+
|20210603|
|20210812|
+--------+

If two columns as in last question update are required:

val inputDF = Seq("20210801", "20210301", "03091998").toDF("date")
inputDF
  .withColumn("dateTimeStamp", unix_timestamp($"date", "yyyyMMdd").cast(TimestampType))
  .withColumn("currentMonthDate", when(month($"dateTimeStamp") === month(current_date()), $"date").otherwise(lit(null).cast(StringType)))
  .withColumn("dateFormatCorrect", when($"dateTimeStamp".isNotNull, lit("Y")).otherwise(lit("DOB: should be present in YYYYMMDD format").cast(StringType)))
  .select("currentMonthDate", "dateFormatCorrect")

Output is:

+----------------+-----------------------------------------+
|currentMonthDate|dateFormatCorrect                        |
+----------------+-----------------------------------------+
|20210801        |Y                                        |
|null            |Y                                        |
|null            |DOB: should be present in YYYYMMDD format|
+----------------+-----------------------------------------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.