2

I have pandas dataframe in the Azure Databricsk. I need to save it as ONE csv file on Azure Data Lake gen2.

I've tried with :

df.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv(dstPath)

and

df.write.format("csv").mode("overwrite").save(dstPath)

but now I have 10 csv files but I need one file and name it.

Thanks in advance.

3
  • You can use .coalesce(1) to pull all the data into a single partition before writing: df.coalesce(1).write... Just beware the performance can take a serious hit. Commented Jun 22, 2021 at 13:12
  • @JoelCochran it works but is it possible to name this file? If the dstPath looks like '/mnt/path/file.csv' the folder file.csv is created instead of file. Commented Jun 22, 2021 at 15:52
  • Unfortunately, I can't help there. There are a lot of other threads that discuss that question. Good luck. Commented Jun 22, 2021 at 17:24

1 Answer 1

3

I've found a solution :

df.to_csv('/dbfs/mnt/....../df.csv', sep=',', header=True, index=False)
Sign up to request clarification or add additional context in comments.

1 Comment

This works, I wonder why

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.