I have a properly sync'ed pyspark client / spark installation: both versions are 3.3.1 [ shown below]. The full exception message is:
py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist
This has been identified in another SOF post as most likely due to versioning mismatch between the pyspark invoker/caller and the spark backend. I agree that would seem the likely cause: but then I have verified carefully that both sides of the equation are equal:
pyspark and spark are same versions:
Python 3.10.13 (main, Aug 24 2023, 22:48:59) [Clang 14.0.3 (clang-1403.0.22.14.1)]
In [1]: import pyspark
In [2]: print(f"PySpark version: {pyspark.__version__}")
PySpark version: 3.3.1
Spark was installed by downloading the version 3.3.1 .tgz directly from the apache site and unzip/tar-ring. The SPARK_HOME was pointed to that directory and the $SPARK_HOME/bin added to the path.
$spark-shell --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Inside the python script the version has been verified as well:
pyspark version: 3.3.1
But the script blows up with a pyspark / spark error
An error occurred while calling None.org.apache.spark.api.python.PythonFunction
py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:180)
So .. what else might be going on here? Is there some way I'm not seeing in which the versions of spark/pyspark might be out of sync?
notebooktoscript. It is the exported python code from an earlier notebook. I'll add about how spark was installed