Actually I am working on pyspark code. My dataframe is
+-------+--------+--------+--------+--------+
|element|collect1|collect2|collect3|collect4|
+-------+--------+--------+--------+--------+
|A1 | 1.02 | 2.6 | 5.21 | 3.6 |
|A2 | 1.61 | 2.42 | 4.88 | 6.08 |
|B1 | 1.66 | 2.01 | 5.0 | 4.3 |
|C2 | 2.01 | 1.85 | 3.42 | 4.44 |
+-------+--------+--------+--------+--------+
I need to find the mean and stddev for each element by aggregating all the collectX columns. The final result should be as below.
+-------+--------+--------+
|element|mean |stddev |
+-------+--------+--------+
|A1 | 3.11 | 1.76 |
|A2 | 3.75 | 2.09 |
|B1 | 3.24 | 1.66 |
|C2 | 2.93 | 1.23 |
+-------+--------+--------+
The below code breakdown all the mean at individual columns df.groupBy("element").mean().show(). Instead of doing for each column, is it possible to rollup for all the columns?
+-------+-------------+-------------+-------------+-------------+
|element|avg(collect1)|avg(collect2)|avg(collect3)|avg(collect4)|
+-------+-------------+-------------+-------------+-------------+
|A1 | 1.02 | 2.6 | 5.21 | 3.6 |
|A2 | 1.61 | 2.42 | 4.88 | 6.08 |
|B1 | 1.66 | 2.01 | 5.0 | 4.3 |
|C2 | 2.01 | 1.85 | 3.42 | 4.44 |
+-------+-------------+-------------+-------------+-------------+
I tried to make use of the describe function as it has the complete aggregation functions but still shown as individual column df.groupBy("element").mean().describe().show()
thanks