Create a csv file with timestamp as file name using a dataframe scala

Question

I have a dataframe with data as follows.

+---------------+-------+
|category       |marks  |
+---------------+-------+
|cricket        |1.0    |
|tennis         |1.0    |
|football       |2.0    |
+---------------+-------+

I want to write the above dataframe into a csv file where file name will be created with current timestamp.

generatedDataFrame.write.mode ("append")
    .format("com.databricks.spark.csv").option("delimiter", ";").save("./src/main/resources-"+LocalDateTime.now()+".csv")

But this code is not working properly. Giving the following error

java.io.IOException: Mkdirs failed to create file

Is there a better way to achieve this using scala and spark? Also even though I am trying to create the file with timestamp code is creating a directory with the timestamp and inside that directory a csv with data is created with a random name. how can I have the timestamp filename to these csv files instead of creating a directory?

Mohana B C · Accepted Answer · 2021-03-01 18:51:43Z

DF.write.csv will always create a folder with the name you specified and places the output csv files in that folder.

If you want single csv file as a output with the name as timestamp then you can use below code:

import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.sql._
import org.apache.hadoop.fs.{FileSystem, Path}

val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

generatedDataFrame.coalesce(1).write.mode("append").csv("./src/main/resources/outputcsv/")

val outFileName = fs.globStatus(new Path("./src/main/resources/outputcsv/part*"))(0).getPath.getName

val timestamp = new SimpleDateFormat("yyyyMMddHHmm").format(new Date())

fs.rename(new Path(s"./src/main/resources/outputcsv/$outFileName"), new Path(s"./src/main/resources/outputcsv/${timestamp}.csv"))

Nav · Accepted Answer · 2021-03-01 18:48:43Z

-1

You should be using src/main/resources and not ./src/main/resources. You can check the permissions for directory creation from command line. Also, using LocalDateTime.now directly in path will look something like this "2021-03-01T13:39:09.646", not sure if this is what you want or even if it is valid for HDFS paths(chars like [:]), so would suggest to use date-formatting as well.

answered Mar 1, 2021 at 18:48

Nav

1,41415 silver badges16 bronze badges

Collectives™ on Stack Overflow

Create a csv file with timestamp as file name using a dataframe scala

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related