0

I am trying to follow this Python notebook. I installed Spark directly in the notebook (!pip install pyspark), but when I do:

spark = SparkSession \
    .builder \
    .appName("question recommendation") \
    .config("spark.driver.maxResultSize", "96g") \
    .config("spark.driver.memory", "96g") \
    .config("spark.executor.memory", "8g") \
    .config("spark.master", "local[12]") \
    .getOrCreate()
sc = spark.sparkContext

I get a Runtime error on the first line:

RuntimeError                              Traceback (most recent call last)
<ipython-input-17-1b87e1472109> in <module>
      1 # spark config
----> 2 spark = SparkSession \
      3     .builder \
      4     .appName("question recommendation") \
      5     .config("spark.driver.maxResultSize", "96g") \

~\anaconda3\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
    226                             sparkConf.set(key, value)
    227                         # This SparkContext may be an existing one.
--> 228                         sc = SparkContext.getOrCreate(sparkConf)
    229                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    230                     # by all sessions.

~\anaconda3\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
    390         with SparkContext._lock:
    391             if SparkContext._active_spark_context is None:
--> 392                 SparkContext(conf=conf or SparkConf())
    393             return SparkContext._active_spark_context
    394 

~\anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    142                 " is not allowed as it is a security risk.")
    143 
--> 144         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    145         try:
    146             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

~\anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    337         with SparkContext._lock:
    338             if not SparkContext._gateway:
--> 339                 SparkContext._gateway = gateway or launch_gateway(conf)
    340                 SparkContext._jvm = SparkContext._gateway.jvm
    341 

~\anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
    106 
    107             if not os.path.isfile(conn_info_file):
--> 108                 raise RuntimeError("Java gateway process exited before sending its port number")
    109 
    110             with open(conn_info_file, "rb") as info:

RuntimeError: Java gateway process exited before sending its port number

I am very new to Apache Spark, is there anything I have installed incorrectly? Should I have installed it via Conda? Is there anything on my system that I need to check out?

1

1 Answer 1

2

The main clue to the error is in the last line

"RuntimeError: Java gateway process exited before sending its port number"

You can check an old stack overflow link below for solution

Pyspark: Exception: Java gateway process exited before sending the driver its port number

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.