Difference between spark.sql.files.maxPartitionBytes and spark.files.maxPartitionBytes

Question

I see that Spark 2.0.0 introduced a property spark.sql.files.maxPartitionBytes and it's subsequent sub-release (2.1.0) introduced spark.files.maxPartitionBytes

The Spark configuration link says in case of former -

The maximum number of bytes to pack into a single partition when reading files. This configuration is effective only when using file-based sources such as Parquet, JSON and ORC.

Whereas in case of latter, it mentions -

The maximum number of bytes to pack into a single partition when reading files.

Both the above explanations point to one common thing - they are both used in reading files. But the next sentence mentions the use of spark.sql.files.maxPartitionBytes in reading JSON, Parquet files.

Does that mean spark.files.maxPartitionBytes is used when reading files for low-level APIs like RDDs ?

Ged · Accepted Answer · 2025-08-08 08:37:30Z

2

Yes, you got it.

The .sql generally implies dataframes, datasets scope. The spark.files.maxPartitionBytes is for RDD's.

edited Aug 8 at 8:37

answered Aug 8 at 8:32

Ged

18.5k8 gold badges53 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Difference between spark.sql.files.maxPartitionBytes and spark.files.maxPartitionBytes

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related