1
col1 col2
[1,2,3,4] [0,1,0,3]
[5,6,7,8] [0,3,4,8]

desired result:

col1 col2
[6,8,10,12] [0,4,4,11]

In snowflake's snowpark this is relatively straight forward using array_construct. Apache Spark has a similar array function but there is a major difference.

In snowpark, I can do array_construct(count('*'), sum(col('x')), sum(col('y'), count(col('y'))) but apache spark seems to count array() as an aggregation and complains that I can't have an aggregation inside of an aggregation.

pyspark.sql.utils.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;

I'm trying to write a piece of code that can handle both snowpark and apache spark but this array_construct vs array is proving trickier than anticipated. Next up is to explore doing a groupby & collect_list but wondering how others have solved this?

1

2 Answers 2

1

If you treat the aggregations as delimited strings, you can split them to an array

select a, split(concat_ws(',',count(b),sum(c)),',')
from t
group by a;
Sign up to request clarification or add additional context in comments.

Comments

0

Turns out it's not the difference between array functions at all, it was a completely different issue.

I have a translation function, that translates snowpark's regexp_count function to pyspark using the formula

    def regexp_count(self, col, regex):
        return F.sum(F.when(col.rlike(regex), 1).otherwise(0))

This causes issues later on when I did F.sum(regexp_count(col, regex)) hence the error, because as becomes obvious, there is an F.sum inside an F.sum :/

Instead, the following translation fixes the issue

    def regexp_count(self, col, regex):
        return F.size(F.split(col, regex)) - 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.