1

I'm writing a UDAF aggregation function and I want to return a data type that is either a struct with column names (e.g start and end both of long type) or 2 columns.

In the evaluate function tried to return a map type and an array but that was not what I was expecting.

Would love to get a clue about it. Thanks

1

1 Answer 1

1

The simplest way to do that is to return a List with your values in one field, and then, expand it in several columns.

Here you can read an example where the UDAF try to return two Integer columns:


UDAF (important code parts)


public YourUDAFName(someParams) {
    [...]
    _returnDataType = DataTypes.createArrayType(DataTypes.IntegerType);
}
[...]
@Override
public Object evaluate(Row buffer) {
    List<Integer> output = new ArrayList<>();
    output.add(1); //Here put your logical...
    output.add(5); // "
    return output;    
}

Example of use...


Dataset<Row> ds = getYourDatasetHere();
YourUDAFName udaf = new YourUDAFName(someParams);
ds.groupBy("yourGroupByKey")
.agg(udaf .apply(
    col("someColumnFromDs"),
    col("someOtherColumn")).as("columnWithList"));

// Here we expand the "columnWithList"...
List<Column> newColumns = new ArrayList<>();
for (int i = 0; i < numElementInTheList; i++) {
    ds = ds.withColumn("nameOfYourExpandedColumn", ds.col("outputByIntervals").getItem(i));
}
ds.show();

I hope that helps you!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.