0

I have a dataframe within the Databricks environment. I need to download this dataframe to my personal machine. This dataframe contains 10,000 rows. So try to do the following:

 df_test.coalesce(1).write.csv("dbfs:/FileStore/tables/df_test", header=True, mode='overwrite')

However, I'm not able to run the cell. The following error message appears:

org.apache.spark.SparkException: Job aborted.

Could someone help me solve the problem?

1
  • please post the full exception - this information isn't enough Commented Jul 8, 2022 at 14:50

1 Answer 1

2

If you didn’t resolve the error, you can try this alternate to save your Pyspark dataframe to local machine as csv file.

With display(dataframe):

Here I created a dataframe with 10,000 rows for your reference. With the display(), databricks allows to download the rows up to 1 million.

Code:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
schema=StructType([ \
                   StructField("id",IntegerType(),True), \
                   StructField("firstname",StringType(),True) \
                   ])
data2=[(1,"Rakesh")]

for i in range(2,10000):
    data2.append((i,"Rakesh"))
df=spark.createDataFrame(data=data2,schema=schema)
df.show(5)
display(df)

Dataframe Creation:

enter image description here

display(df):

In this output by default the display() shows 1000 rows and to download the total dataframe click on the downarrow and then click on Download full results.

enter image description here

Then, click on re-execute and download, now you can download the dataframe as csv file to your local machine.

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.