java.io.IOException in local spark mode

Question

I am a newer in spark.

Thanks for your attention and your help. Here is my problem. I started my spark-shell（version 3.2.1) in local mode in my mac os （Catalina 10.15.7）, then I ran some code below, no exceptions happened

import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType}
import org.apache.spark.sql.Row
import org.apache.spark.sql.DataFrame
import org.apache.spark.rdd.RDD

val seq: Seq[(String, Int)] = Seq(("Bob", 14), ("Alice", 18))
val rdd: RDD[(String, Int)] = sc.parallelize(seq)
val schema:StructType = StructType( Array(StructField("name", StringType),StructField("age", IntegerType)))
val rowRDD: RDD[Row] = rdd.map(fileds => Row(fileds._1, fileds._2))
val dataFrame: DataFrame = spark.createDataFrame(rowRDD,schema)

finally I ran

dataFrame.show

and I got this

22/06/22 11:18:19 ERROR util.Utils: Aborting task=======>         (10 + 2) / 12]
java.io.IOException: Failed to connect to /192.168.1.3:50561
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
    at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399)
    at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
    at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366)
    at org.apache.spark.repl.ExecutorClassLoader.getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:135)
    at org.apache.spark.repl.ExecutorClassLoader.$anonfun$fetchFn$1(ExecutorClassLoader.scala:66)
    at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:176)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:113)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:405)

Hello mate, so it seems your spark is trying to find some way in your network. Try to export this variable before you start your spark-shell: export SPARK_LOCAL_IP="127.0.0.1" — Thiago Baldim
– Thiago Baldim, Commented Jun 22, 2022 at 6:26
Thanks a lot! It works! I wonder what use this variable is for.Can you explain it to me? — handsomeboy
– handsomeboy, Commented Jun 22, 2022 at 7:14

Thiago Baldim · Accepted Answer · 2022-06-22 07:43:26Z

8

I will explain to you few things why the first step didn't throw the error and after the show that actually show the error.

Spark works with two different stages, transformations and executions.

The first part of your code are all transformations, you are telling s your Spark Driver to analyse the transformation and make sure your transformations will work. But will not execute anything until you call an execution step.

What is an execution step? Is anything that will throw an output, like a show() that will show the data in the screen, or a write() that will write output files.

Your problem was, once you do the transformation that runs in your local driver, no need to send to any cluster. Once you do an execution step it should send the commands to your spark cluster. Because you are running locally, there is no cluster to send. So to avoid this problem you should add the env variable:

$ export SPARK_LOCAL_IP="127.0.0.1"
$ spark-shell

That you will tell to your application that you are running locally and you don't have any cluster to send the transformation.

answered Jun 22, 2022 at 7:43

Thiago Baldim

7,7723 gold badges34 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Desmond Cheong Over a year ago

Just to add on, you can also set the SPARK_LOCAL_IP variable inside $SPARK_HOME/conf/spark-env.sh. If the config file does not exist, you can make a copy of $SPARK_HOME/conf/spark-env.sh.template.

soMuchToLearnAndShare Over a year ago

the strange thing is, one of our spark applications used to work locally<run via ide> and it started not work anymore (i wonder it was due to to small data)

b-frid Feb 7 at 22:25

Holy moly I'm glad to have stumbled on this! Been banging my head trying to figure this out for like 2 hours. Had code running in the interpreter just fine but that was failing with this error when trying to run using spark-submit (i.e. running from a script file).

Collectives™ on Stack Overflow

java.io.IOException in local spark mode

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related