6

I have a databricks data frame called df. I want to write it to a S3 bucket as a csv file. I have the S3 bucket name and other credentials. I checked the online documentation given here https://docs.databricks.com/spark/latest/data-sources/aws/amazon-s3.html#mount-aws-s3 and it says to use following commands

dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName", "sse-s3")

dbutils.fs.put(s"/mnt/$MountName", "<file content>")

But what I have is a dataframe and not a file. How can I achieve it?

1

1 Answer 1

5

I had the same problem. I found two solutions

1srt

df
.write \
.format("com.databricks.spark.csv") \
.option("header", "true") \
.save("s3a://{}:{}@{}/{}".format(ACCESS_KEY, SECRET_KEY, BUCKET_NAME, DIRECTORY)))

Worked like a charm.

2nd

You can indeed mount an S3 Bucket and then write a file to it directly like this :

#### MOUNT AND READ S3 FILES
AWS_BUCKET_NAME = "your-bucket-name"
MOUNT_NAME = "a-directory-name"
dbutils.fs.mount("s3a://%s" % AWS_BUCKET_NAME, "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s" % MOUNT_NAME))

#### WRITE FILE 

df.write.save('/mnt/{}/{}'.format(MOUNT_NAME, "another-directory-name"), format='csv')

This is also going to sync to your S3 Bucket.

Sign up to request clarification or add additional context in comments.

1 Comment

this line is missing that why unable to connect encoded_secret_key = secret_key.replace("/", "%2F"). After applying, successfully connected. It is mentioned on docs.databricks.com/spark/latest/data-sources/aws/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.