0

Not sure whats wrong with the code below , but its throwing

org.apache.spark.SparkException: Task not serializable

error. Googled about the error, but couldn't resolve.

Below is the code: (Can be copy pasted and executed on community.cloud.databricks.com by creating new Scala notebook)

    import com.google.gson._    
     object TweetUtils {
          case class Tweet(
               id : String,
               user : String,
               userName : String,
               text : String,
               place : String,
               country : String,
               lang : String
          ) 

       def parseFromJson(lines:Iterator[String]):Iterator[Tweet] = {
            val gson = new Gson
            lines.map( line => gson.fromJson(line, classOf[Tweet]))     
       }

       def loadData(): RDD[Tweet] = { 
           val pathToFile = "/FileStore/tables/reduced_tweets-57570.json"
           sc.textFile(pathToFile).mapPartitions(parseFromJson(_))
       }

       def tweetsByUser(): RDD[(String, Iterable[Tweet])] = {
           val tweets = loadData
           tweets.groupBy(_.user)    
       }  
   } 

   val res = TweetUtils.tweetsByUser()
   res.collect().take(5).foreach(println)

Below is the detailed error Message:

at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2548)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:827)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:826)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:392)
    at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:826)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$TweetUtils$.loadData(command-3696793732897971:22)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$TweetUtils$.tweetsByUser(command-3696793732897971:25)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3696793732897971:30)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3696793732897971:84)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3696793732897971:86)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3696793732897971:88)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw$$iw.<init>(command-3696793732897971:90)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw$$iw.<init>(command-3696793732897971:92)
    at line05db6d250e4b42e2b2c1d6b97ba83df533.$read$$iw$$iw.<init>(command-3696793732897971:94)

Thanks in advance,

Sri

2
  • Could you try to move case class Tweet out of TweetUtils object. TweetUtils is not serializable. Commented Feb 18, 2020 at 15:32
  • Hi Artem Aliev, thanks for your reply. I tried moving the "case class tweet" outside the "object", But I face the same issue "Task not serializable". Commented Feb 20, 2020 at 8:04

2 Answers 2

1

Finally, what worked is by implementing together both suggestions from " Artem Aliev" and "Partha" together. i.e. by moving "case class Tweet" outside of "TweetUtils object" and also by extending the Object "object TweetUtils extends Serializable"

Thanks to both of you.

Sign up to request clarification or add additional context in comments.

Comments

0

Make your TweetUtils object as Serializable and it should work:

object TweetUtils extends Serializable

1 Comment

Hi Partha, I Tried Making the object serializable. But I get the error "Malformed class name"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.