Databrics save pandas dataframe as CSV Azure Data Lake

Question

I have pandas dataframe in the Azure Databricsk. I need to save it as ONE csv file on Azure Data Lake gen2.

I've tried with :

df.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv(dstPath)

and

df.write.format("csv").mode("overwrite").save(dstPath)

but now I have 10 csv files but I need one file and name it.

Thanks in advance.

You can use .coalesce(1) to pull all the data into a single partition before writing: df.coalesce(1).write... Just beware the performance can take a serious hit. — Joel Cochran
– Joel Cochran, Commented Jun 22, 2021 at 13:12
@JoelCochran it works but is it possible to name this file? If the dstPath looks like '/mnt/path/file.csv' the folder file.csv is created instead of file. — inspiredd
– inspiredd, Commented Jun 22, 2021 at 15:52
Unfortunately, I can't help there. There are a lot of other threads that discuss that question. Good luck. — Joel Cochran
– Joel Cochran, Commented Jun 22, 2021 at 17:24

inspiredd · Accepted Answer · 2021-06-23 12:30:26Z

3

I've found a solution :

df.to_csv('/dbfs/mnt/....../df.csv', sep=',', header=True, index=False)

answered Jun 23, 2021 at 12:30

inspiredd

2376 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

This works, I wonder why