I want to find out Whether this array contains this date or not. if yes i need to put yes in one column.
Dataset<Row> dataset = dataset.withColumn("incoming_timestamp", col("incoming_timestamp").cast("timestamp"))
.withColumn("incoming_date", to_date(col("incoming_timestamp")));
my incoming_timestamp is 2021-03-30 00:00:00 after converting to date it is 2021-03-30
output dataset is like this
+----------------------+-------------------+----------------------------------------+
|col 1 |incoming_timestamp | incoming_date |
+----------------------+-------------------+-----------------------------------------
|val1 |2021-03-30 00:00:00| 2021-07-06 |
|val2 |2020-03-30 00:00:00| 2020-03-30 |
|val3 |1889-03-30 00:00:00| 1889-03-30 |
-------------------------------------------------------------------------------------
i have a String declared like this,
String Dates = "2021-07-06,1889-03-30";
i want to add one more col in the result dataset is the incoming date is present in Dates String.
Like this,
+----------------------+-------------------+----------------------------------------+--------------+
|col 1 |incoming_timestamp | incoming_date | result |
+----------------------+-------------------+--------------------------------------------------------
|val1 |2021-03-30 00:00:00| 2021-07-06 | true |
|val2 |2020-03-30 00:00:00| 2020-03-30 | false |
|val3 |1889-03-30 00:00:00| 1889-03-30 | true |
----------------------------------------------------------------------------------------------------
for that first i need to convert this String into Array, then array_contains(value,array) Returns true if the array contains the value.
i tried the following,
METHOD 1
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
Date[] dateArr = Arrays.stream((dates.split(","))).map(d->(LocalDate.parse(d,
formatter))).toArray(Date[]::new);
it throws error, java.lang.ArrayStoreException: java.time.LocalDate
METHOD 2
SimpleDateFormat formatter = new SimpleDateFormat("YYYY-MM-DD", Locale.ENGLISH);
formatter.setTimeZone(TimeZone.getTimeZone("America/New_York"));
Date[] dateArr = Arrays.stream((Dates.split(","))).map(d-> {
try {
return (formatter.parse(d));
} catch (ParseException e) {
e.printStackTrace();
}
return null;
}).toArray(Date[]::new);
dataset = dataset.withColumn("result",array_contains(col("incoming_date"),dates));
it throws error
org.apache.spark.sql.AnalysisException: Unsupported component type class java.util.Date in arrays
Can anyone help on this?