3

I am very new to pyspark and getting below error, even if drop all date related columns or selecting only one column. Date format stored in my data frame like "enter image description here". Can anyone please suggest changes I could made in dataframe to resolve this/date formats supported by new parser. It's working if I set "spark.sql.legacy.timeParserPolicy" to "LEGACY"

[INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Caused by: DateTimeParseException: Text '1/1/2023 3:57:22 AM' could not be parsed at index 0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 15.0 failed 4 times, most recent failure: Lost task 4.3 in stage 15.0 (TID 355) (10.139.64.5 executor 0): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Fail to parse '1/1/2023 3:57:22 AM' in the new parser. You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the behavior before Spark 3.0, or set to "CORRECTED" and treat it as an invalid datetime string.

Example:

#spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
from pyspark.sql.functions import *
from pyspark.sql import functions as F

emp = [(1, "AAA", "dept1", 1000, "12/22/2022  3:11:44 AM"),
(2, "BBB", "dept1", 1100, "12/22/2022  3:11:44 AM"),
(3, "CCC", "dept1", 3000, "12/22/2022  3:11:44 AM"),
(4, "DDD", "dept1", 1500, "12/22/2022  3:11:44 AM"),
(5, "EEE", "dept2", 8000, "12/22/2022  3:11:44 AM"),
(6, "FFF", "dept2", 7200, "12/22/2022  3:11:44 AM"),
(7, "GGG", "dept3", 7100, "12/22/2022  3:11:44 AM"),
(8, "HHH", "dept3", 3700, "12/22/2022  3:11:44 PM"),
(9, "III", "dept3", 4500, "12/22/2022  3:11:44 PM"),
(10, "JJJ", "dept5", 3400,"12/22/2022 3:11:44 PM")]
empdf = spark.createDataFrame(emp, ["id", "name", "dept", "salary", 
"date"])

#empdf.printSchema()
df = empdf.withColumn("date", F.to_timestamp(col("date"), 
"MM/dd/yyyy hh:mm:ss a"))
df.show(12,False)

Thanks a lot, in Advance

2
  • Could you please add the sample data, the code which triggers this error and desired output? Commented Jan 2, 2023 at 18:47
  • Hi @BartoszGajda, Really sorry for late reply, I have added sample code now. Commented Jan 18, 2023 at 10:50

3 Answers 3

3

In general the new parser does not seem to like normal date patterns like MM or dd but prefers M or d.

Example 1:

# Does not work: 
df = df.withColumn("date", F.to_timestamp(col("date"), "MM/dd/yyyy  hh:mm:ss a"))

# Does work:
df = pdf.withColumn("date", F.to_timestamp(col("date"), "M/d/yyyy h:m:s a"))

Example 2

# Does not work: 
MM/dd/yyyy H:mm:ss a

# Does work: 
MM/d/yyyy H:mm:ss a

In addition the parser seems to be less robust against things that are not explicitly defined in the pattern, e.g. when parsing dates from timestamps.

Example 3

String example: 02.03.2004 10:35 `

# Does not work: 
df.withColumn(
  "parsed_date", 
  psf.to_date("unparsed_date", "dd.MM.yyyy"))

# Does work: 
df.withColumn(
  "parsed_date", 
  psf.to_date(psf.col("unparsed_date").substr(0, 10), "d.M.y"))
Sign up to request clarification or add additional context in comments.

2 Comments

In my case switching from 'MM/dd/yyyy H:mm:ss a' to 'MM/d/yyyy H:mm:ss a' made it work without the error. The date looks like this: 1/13/2020 1:11:34 AM
In my case with entries like this 02.03.2004 10:35 I had to substring the date first to then make it work via: ... .withColumn("parsed_date", psf.to_date(psf.col("unparsed_date").substr(0, 10), "d.M.y"))
2

Following settings worked for me:- s

spark.conf.set("spark.sql.parquet.int96RebaseModeInWrite", "CORRECTED")
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")

1 Comment

Doing this surely helped. So thank you! For spark 3.4.1 the format to change from string to timestamp that worked for me was "d/MM/yyyy h:m" (case sensitive)
0

Add this code part at the beginning of the notebook

spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED")

1 Comment

This worked for me! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.