1

I have a Dataframe with a date column representing Unix timestamp in String format. The column values need to be formatted to a different String representation as below -

Input Dataframe

+----+----+-------------+
|name|code|         date|
+----+----+-------------+
|   A|   1|1545905416000|
|   B|   3|1545905416000|
|   C|   5|1545905416000|
+----+----+-------------+

Expected output Dataframe

+----+----+-------------+
|name|code|         date|
+----+----+-------------+
|   A|   1|   2018-12-27|
|   B|   3|   2018-12-27|
|   C|   5|   2018-12-27|
+----+----+-------------+

This didn't work as it is giving null for all values -

 peopleDFCsv.withColumn("formatted_date", 
            functions.date_format(functions.col("date"), "yyyy-MM-dd"))
            .show();
1

1 Answer 1

2

The date_format function works with timestamp not milliseconds since epoch. Try doing the conversion with CAST:

df.withColumn("formatted_date",
  functions.date_format(expr("CAST(date/1000 AS TIMESTAMP)"), "yyyy-MM-dd"))
  .show()

//    Outputs:
//    +----+----+-------------+--------------+
//    |name|code|         date|formatted_date|
//    +----+----+-------------+--------------+
//    |   A|   1|1545905416000|    2018-12-27|
//    |   B|   3|1545905416000|    2018-12-27|
//    |   C|   5|1545905416000|    2018-12-27|
//    +----+----+-------------+--------------+

The reason why you are getting null is because date_format is interpreting your string as a timestamp such as "2018-12-27 11:10:16". The plain number of does not conform to the format it expects so it just returns null.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.