I keep running into this issue when running PySpark.
I was able to connect to my database and retrieve data, but whenever I try do operations like .show() or .count(), or when I try to save a Spark DataFrame to a CSV, it keeps crashing with the following error traceback.
(Note: I am using SparkSession.builder)
Error 1:
py4j.protocol.Py4JJavaError: An error occurred while calling o121.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in stage 0.0 failed 1 times, most recent failure: Lost task 13.0 in stage 0.0 (TID) ( driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:624)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:599)
Error 2:
25/09/29 15:21:22 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID) ( executor driver): TaskKilled (Stage cancelled: Job aborted due to stage failure: Task 4 in stage 2.0 failed 1 times, most recent failure: Lost task 4.0 in stage 2.0 (TID 4) (executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:624)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:599)
Python worker exited unexpectedly (crashed). this could be due to multiple reasons - one of which is incompatible python or java version; another is incorrect configs. try looking into those.