1

I am a newer in spark.

Thanks for your attention and your help. Here is my problem. I started my spark-shell(version 3.2.1) in local mode in my mac os (Catalina 10.15.7), then I ran some code below, no exceptions happened

import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType}
import org.apache.spark.sql.Row
import org.apache.spark.sql.DataFrame
import org.apache.spark.rdd.RDD

val seq: Seq[(String, Int)] = Seq(("Bob", 14), ("Alice", 18))
val rdd: RDD[(String, Int)] = sc.parallelize(seq)
val schema:StructType = StructType( Array(StructField("name", StringType),StructField("age", IntegerType)))
val rowRDD: RDD[Row] = rdd.map(fileds => Row(fileds._1, fileds._2))
val dataFrame: DataFrame = spark.createDataFrame(rowRDD,schema)

finally I ran

dataFrame.show

and I got this

22/06/22 11:18:19 ERROR util.Utils: Aborting task=======>         (10 + 2) / 12]
java.io.IOException: Failed to connect to /192.168.1.3:50561
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
    at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399)
    at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
    at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366)
    at org.apache.spark.repl.ExecutorClassLoader.getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:135)
    at org.apache.spark.repl.ExecutorClassLoader.$anonfun$fetchFn$1(ExecutorClassLoader.scala:66)
    at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:176)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:113)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
3
  • 1
    Hello mate, so it seems your spark is trying to find some way in your network. Try to export this variable before you start your spark-shell: export SPARK_LOCAL_IP="127.0.0.1" Commented Jun 22, 2022 at 6:26
  • Thanks a lot! It works! I wonder what use this variable is for.Can you explain it to me? Commented Jun 22, 2022 at 7:14
  • Just post the answer below mate! Enjoy the world of Spark! Commented Jun 22, 2022 at 7:43

1 Answer 1

8

I will explain to you few things why the first step didn't throw the error and after the show that actually show the error.

Spark works with two different stages, transformations and executions.

The first part of your code are all transformations, you are telling s your Spark Driver to analyse the transformation and make sure your transformations will work. But will not execute anything until you call an execution step.

What is an execution step? Is anything that will throw an output, like a show() that will show the data in the screen, or a write() that will write output files.

Your problem was, once you do the transformation that runs in your local driver, no need to send to any cluster. Once you do an execution step it should send the commands to your spark cluster. Because you are running locally, there is no cluster to send. So to avoid this problem you should add the env variable:

$ export SPARK_LOCAL_IP="127.0.0.1"
$ spark-shell

That you will tell to your application that you are running locally and you don't have any cluster to send the transformation.

Sign up to request clarification or add additional context in comments.

3 Comments

Just to add on, you can also set the SPARK_LOCAL_IP variable inside $SPARK_HOME/conf/spark-env.sh. If the config file does not exist, you can make a copy of $SPARK_HOME/conf/spark-env.sh.template.
the strange thing is, one of our spark applications used to work locally<run via ide> and it started not work anymore (i wonder it was due to to small data)
Holy moly I'm glad to have stumbled on this! Been banging my head trying to figure this out for like 2 hours. Had code running in the interpreter just fine but that was failing with this error when trying to run using spark-submit (i.e. running from a script file).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.