How to save a PySpark dataframe to the personal machine using Databricks?

Question

I have a dataframe within the Databricks environment. I need to download this dataframe to my personal machine. This dataframe contains 10,000 rows. So try to do the following:

 df_test.coalesce(1).write.csv("dbfs:/FileStore/tables/df_test", header=True, mode='overwrite')

However, I'm not able to run the cell. The following error message appears:

org.apache.spark.SparkException: Job aborted.

Could someone help me solve the problem?

please post the full exception - this information isn't enough — Alex Ott
– Alex Ott, Commented Jul 8, 2022 at 14:50

Rakesh Govindula · Accepted Answer · 2022-07-08 16:43:12Z

If you didn’t resolve the error, you can try this alternate to save your Pyspark dataframe to local machine as csv file.

With display(dataframe):

Here I created a dataframe with 10,000 rows for your reference. With the display(), databricks allows to download the rows up to 1 million.

Code:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
schema=StructType([ \
                   StructField("id",IntegerType(),True), \
                   StructField("firstname",StringType(),True) \
                   ])
data2=[(1,"Rakesh")]

for i in range(2,10000):
    data2.append((i,"Rakesh"))
df=spark.createDataFrame(data=data2,schema=schema)
df.show(5)
display(df)

Dataframe Creation:

display(df):

In this output by default the display() shows 1000 rows and to download the total dataframe click on the downarrow and then click on Download full results.

Then, click on re-execute and download, now you can download the dataframe as csv file to your local machine.

Collectives™ on Stack Overflow

How to save a PySpark dataframe to the personal machine using Databricks?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related