Ok so the stack trace given above is not sufficient to understand the root cause, but as you mentioned you are using the join the most probably it's happening because of that. I faced the same issue for join, if you dig down your stack trace you would see something like -
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#73300L])
+- *Project
+- *BroadcastHashJoin
...
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
This gives hint why it's failing, Spark tries to join using "Broadcast Hash Join", which has Timeout and Broadcast size threshold, either of which causes above error.To fix this depending on underlying error -
Increase the "spark.sql.broadcastTimeout", default is 300 sec -
spark = SparkSession
.builder
.appName("AppName")
.config("spark.sql.broadcastTimeout", "1800")
.getOrCreate()
Or increase the broadcast threshold,default is 10 MB -
spark = SparkSession
.builder
.appName("AppName")
.config("spark.sql.autoBroadcastJoinThreshold", "20485760 ")
.getOrCreate()
Or disable the Broadcast join by setting value to -1
spark = SparkSession
.builder
.appName("AppName")
.config("spark.sql.autoBroadcastJoinThreshold", "-1")
.getOrCreate()
More details can be found here - https://spark.apache.org/docs/latest/sql-performance-tuning.html