2

I have a properly sync'ed pyspark client / spark installation: both versions are 3.3.1 [ shown below]. The full exception message is:

py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist

This has been identified in another SOF post as most likely due to versioning mismatch between the pyspark invoker/caller and the spark backend. I agree that would seem the likely cause: but then I have verified carefully that both sides of the equation are equal:

pyspark and spark are same versions:

Python 3.10.13 (main, Aug 24 2023, 22:48:59) [Clang 14.0.3 (clang-1403.0.22.14.1)]

In [1]: import pyspark

In [2]: print(f"PySpark version: {pyspark.__version__}")
PySpark version: 3.3.1

Spark was installed by downloading the version 3.3.1 .tgz directly from the apache site and unzip/tar-ring. The SPARK_HOME was pointed to that directory and the $SPARK_HOME/bin added to the path.

$spark-shell --version

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Inside the python script the version has been verified as well:

pyspark version: 3.3.1

But the script blows up with a pyspark / spark error

An error occurred while calling None.org.apache.spark.api.python.PythonFunction

py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:180)

So .. what else might be going on here? Is there some way I'm not seeing in which the versions of spark/pyspark might be out of sync?

2
  • Can you add a bit more information about your setup? How did you install Spark? And since you're using notebooks, how did you make the link between your notebook and Spark? Are you running the notebook inside of a virtual environment? Commented Oct 3, 2023 at 18:34
  • @Koedlt I corrected from notebook to script . It is the exported python code from an earlier notebook. I'll add about how spark was installed Commented Oct 3, 2023 at 18:52

1 Answer 1

1

pycharm situation. Looks like I had not restarted it after twiddling between versions of spark. It remembered an earlier version of the default (for homebrew) of 3.5.0

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.