2

I have dataframe with two string columns c1dt and c2tm and it's format is yyyymmdd and yyyymmddTHHmmss.SSSz respectively. Now I want to convert these columns into date type and timestamp type columns and I tried the following but it doesn't work it shows columns values as null.

val newdf = df.withColumn("c1dt", unix_timestmap("c1dt","yyyymmdd").cast("date").withColumn("c2tm","yyyymmddTHHmmss.SSSz").cast("timestamp"))

When I call newdf.show both columns values show as null. If I print original dataframe df I see date and timestamp values.

1 Answer 1

-1

Since you timestamp format is not the default one your best bet is probably to create a udf.

def _stringToTs(s: String): Timestamp = {
  val format = new SimpleDateFormat("yyyymmddTHHmmss.SSSz")
  val date = format.parse(timestamp)
  new Timestamp(miliseconds);
}
import org.apache.spark.sql.functions.udf
val stringToTS = udf(_stringToTS)
val newdf = df.withColumn("c1dt", stringToTS($"c1dt").cast("date").withColumn("c2tm",stringToTS($"c2tm")))

In case you data is coming from a CSV you can specify the timestamp format before you load the data which will be faster overall

spark.read
      .format("csv")
      .option("inferSchema", "true") // Automatically infer data types
      .option("timestampFormat", "yyyymmddTHHmmss.SSSz")  
      .load("path")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.