I am very new to pyspark and getting below error, even if drop all date related columns or selecting only one column. Date format stored in my data frame like "
". Can anyone please suggest changes I could made in dataframe to resolve this/date formats supported by new parser.
It's working if I set "spark.sql.legacy.timeParserPolicy" to "LEGACY"
[INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Caused by: DateTimeParseException: Text '1/1/2023 3:57:22 AM' could not be parsed at index 0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 15.0 failed 4 times, most recent failure: Lost task 4.3 in stage 15.0 (TID 355) (10.139.64.5 executor 0): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Fail to parse '1/1/2023 3:57:22 AM' in the new parser. You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid datetime string.
Example:
#spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
from pyspark.sql.functions import *
from pyspark.sql import functions as F
emp = [(1, "AAA", "dept1", 1000, "12/22/2022 3:11:44 AM"),
(2, "BBB", "dept1", 1100, "12/22/2022 3:11:44 AM"),
(3, "CCC", "dept1", 3000, "12/22/2022 3:11:44 AM"),
(4, "DDD", "dept1", 1500, "12/22/2022 3:11:44 AM"),
(5, "EEE", "dept2", 8000, "12/22/2022 3:11:44 AM"),
(6, "FFF", "dept2", 7200, "12/22/2022 3:11:44 AM"),
(7, "GGG", "dept3", 7100, "12/22/2022 3:11:44 AM"),
(8, "HHH", "dept3", 3700, "12/22/2022 3:11:44 PM"),
(9, "III", "dept3", 4500, "12/22/2022 3:11:44 PM"),
(10, "JJJ", "dept5", 3400,"12/22/2022 3:11:44 PM")]
empdf = spark.createDataFrame(emp, ["id", "name", "dept", "salary",
"date"])
#empdf.printSchema()
df = empdf.withColumn("date", F.to_timestamp(col("date"),
"MM/dd/yyyy hh:mm:ss a"))
df.show(12,False)
Thanks a lot, in Advance