1

I see that Spark 2.0.0 introduced a property spark.sql.files.maxPartitionBytes and it's subsequent sub-release (2.1.0) introduced spark.files.maxPartitionBytes

The Spark configuration link says in case of former -

The maximum number of bytes to pack into a single partition when reading files. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.

Whereas in case of latter, it mentions -

The maximum number of bytes to pack into a single partition when reading files.

Both the above explanations point to one common thing - they are both used in reading files. But the next sentence mentions the use of spark.sql.files.maxPartitionBytes in reading JSON, Parquet files.

Does that mean spark.files.maxPartitionBytes is used when reading files for low-level APIs like RDDs ?

1 Answer 1

2

Yes, you got it.

The .sql generally implies dataframes, datasets scope. The spark.files.maxPartitionBytes is for RDD's.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.