How to write pandas dataframe into Databricks dbfs/FileStore?

Question

I'm new to the Databricks, need help in writing a pandas dataframe into databricks local file system.

I did search in google but could not find any case similar to this, also tried the help guid provided by databricks (attached) but that did not work either. Attempted the below changes to find my luck, the commands goes just fine, but the file is not getting written in the directory (expected wrtdftodbfs.txt file gets created)

df.to_csv("/dbfs/FileStore/NJ/wrtdftodbfs.txt")

Result: throws the below error

FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStore/NJ/wrtdftodbfs.txt'

df.to_csv("\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

df.to_csv("dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

df.to_csv(path ="\\dbfs\\FileStore\\NJ\\",file="wrtdftodbfs.txt")

Result: TypeError: to_csv() got an unexpected keyword argument 'path'

df.to_csv("dbfs:\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

df.to_csv("dbfs:\\dbfs\\FileStore\\NJ\\wrtdftodbfs.txt")

Result: No errors, but nothing written either

The directory exists and the files created manually shows up but pandas to_csv never writes nor error out.

dbutils.fs.put("/dbfs/FileStore/NJ/tst.txt","Testing file creation and existence")

dbutils.fs.ls("dbfs/FileStore/NJ")

Out[186]: [FileInfo(path='dbfs:/dbfs/FileStore/NJ/tst.txt', name='tst.txt', size=35)]

Appreciate your time and pardon me if the enclosed details are not clear enough.

Try converting it to a spark data frame then save it as a csv pandas most likely doesn't have access to the filestore — Umar.H
– Umar.H, Commented Dec 19, 2019 at 21:15
Is it a Spark dataframe or Pandas? The code at the top talks about Spark but everything else looks like Pandas. If it is involving Pandas, you need to make the file using df.to_csv and then use dbutils.fs.put() to put the file you made into the FileStore following here. If it involves Spark, see here. — Wayne
– Wayne, Commented Dec 19, 2019 at 21:16
Have you tried: with open("/dbfs/FileStore/NJ/wrtdftodbfs.txt", "w") as f: df.to_csv(f)? — PMende
– PMende, Commented Dec 19, 2019 at 21:17
Thanks for the response Mende. I did try that but no luck, it runs fine but file is not making into the directory. — Shaan Proms
– Shaan Proms, Commented Dec 19, 2019 at 21:44
Thanks so much Wayne. The second link shared worked. I have converted pandas data frame to spark. Not sure if Databricks filestore works only thru spark commands for writing data to its file system. — Shaan Proms
– Shaan Proms, Commented Dec 19, 2019 at 21:59

GiovaniSalazar · Accepted Answer · 2019-12-19 21:29:39Z

12

Try with this in your notebook databricks:

import pandas as pd
from io import StringIO

data = """
CODE,L,PS
5d8A,N,P60490
5d8b,H,P80377
5d8C,O,P60491
"""

df = pd.read_csv(StringIO(data), sep=',')
#print(df)
df.to_csv('/dbfs/FileStore/NJ/file1.txt')

pandas_df = pd.read_csv("/dbfs/FileStore/NJ/file1.txt", header='infer') 
print(pandas_df)

answered Dec 19, 2019 at 21:29

GiovaniSalazar

2,1042 gold badges11 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Shaan Proms Over a year ago

Thanks Giovani. It worked, seems the files are getting written but does not physically shows up when validated thru gui navigation or thru command fs ls.

GiovaniSalazar Over a year ago

%sh find / -type f -name "file2.txt" recursively @ShaanProms

Shaan Proms Over a year ago

Awesome! I see it. :) Thanks! The dbfs commands %fs ls /dbfs/FileStore/NJ OR dbutils.fs.ls('/dbfs/FileStore/NJ') does not show this file for some reason.

Skippy le Grand Gourou · Accepted Answer · 2021-08-26 13:35:25Z

6

This worked out for me:

outname = 'pre-processed.csv'
outdir = '/dbfs/FileStore/'
dfPandas.to_csv(outdir+outname, index=False, encoding="utf-8")

To download the file, add files/filename to your notebook url (before the interrogation mark ?):

https://community.cloud.databricks.com/files/pre-processed.csv?o=189989883924552#

(you need to edit your home url, for me is :

https://community.cloud.databricks.com/?o=189989883924552#)

dbfs file explorer

edited Aug 26, 2021 at 13:35

Skippy le Grand Gourou

7,8526 gold badges65 silver badges82 bronze badges

answered Feb 22, 2020 at 18:05

Nicoswow

691 silver badge4 bronze badges

3 Comments

MathGeek Over a year ago

How do you get the URL to download? can you tell any generic method to download any file

Nicoswow Over a year ago

Hi Nani, if you put the path+file_name in the middle of your home URL (after .com/), it should be enough, your download should start automatically. In my case, I had to insert "files/pre-processed.csv" in the middle of the home URL.

langeleppel Over a year ago

@MathGeek In databricks (Python), i use a HTML href to access the file from IPython.display import HTML HTML('<a href="community.cloud.databricks.com/files/…" >Get CSV</a>')

L Tyrone · Accepted Answer · 2024-06-10 19:50:29Z

0

df = spark.read.format('csv').options(header = 'true').load(f'file:/Workspace/Users/walmart_stock.csv')

edited Jun 10, 2024 at 19:50

L Tyrone

8,35123 gold badges34 silver badges47 bronze badges

answered Jun 10, 2024 at 16:33

Yash Rai

1

Collectives™ on Stack Overflow

How to write pandas dataframe into Databricks dbfs/FileStore?

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related