0

I have a following class that reads csv data into Spark's Dataset. Everything works fine if I just simply read and return the data.

However, if I apply a MapFunction to the data before returning from function, I get

Exception in thread "main" org.apache.spark.SparkException: Task not serializable

Caused by: java.io.NotSerializableException: com.Workflow.

I know Spark's working and its need to serialize objects for distributed processing, however, I'm NOT using any reference to Workflow class in my mapping logic. I'm not calling any Workflow class function in my mapping logic. So why is Spark trying to serialize Workflow class? Any help will be appreciated.

public class Workflow {

    private final SparkSession spark;   

    public Dataset<Row> readData(){
        final StructType schema = new StructType()
            .add("text", "string", false)
            .add("category", "string", false);

        Dataset<Row> data = spark.read()
            .schema(schema)
            .csv(dataPath);

        /* 
         * works fine till here if I call
         * return data;
         */

        Dataset<Row> cleanedData = data.map(new MapFunction<Row, Row>() {
            public Row call(Row row){
                /* some mapping logic */
                return row;
            }
        }, RowEncoder.apply(schema));

        cleanedData.printSchema();
        /* .... ERROR .... */
        cleanedData.show();

        return cleanedData;
    }
}
1

2 Answers 2

1

anonymous inner classes have a hidden/implicit reference to enclosing class. use Lambda expression or go with Roma Anankin's solution

Sign up to request clarification or add additional context in comments.

Comments

0

you could make Workflow implement Serializeble and SparkSession as @transient

1 Comment

while this may work, it does not answer the question i.e. why is Spark trying to serialize Workflow when I'm not making any reference to it in the map function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.