1

My Code:

finalJoined.show();

Encoder<Row> rowEncoder = Encoders.bean(Row.class);                             
Dataset<Row> validatedDS = finalJoined.map(row -> validationRowMap(row), rowEncoder);       
validatedDS.show();

Map function :

public static Row validationRowMap(Row row) {

        //PART-A validateTxn()

        System.out.println("Inside map");
        //System.out.println("Value of CIS_DIVISION is " + row.getString(7));

        //1. CIS_DIVISION
        if ((row.getString(7)) == null || (row.getString(7)).trim().isEmpty()) {
            System.out.println("CIS_DIVISION cannot be blank.");
        }

return row;

}

Output :

finalJoined Dataset<Row> is properly shown with all columns and rows with proper values, however validatedDS Dataset<Row>is shown with only one column with empty values.

*Expected output : *

validatedDS should also show same values as finalJoined dataset because I am only performing validation inside the map function and not changing the dataset itself.

Please let me know if you need more information.

1 Answer 1

1

Encoders.bean is intended for usage with Bean classes. Row is not one of these (doesn't define setter and getters for specific fields, only generic getters).

To return Row object you have to use RowEncoder and provide expected output schema.

Check for example Encoder for Row Type Spark Datasets

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.