collect_list() gives you an array of values.
A. If you want to collect all the values of a column say c2, based on another column say c1, you can group by c1 and collect values of c2 using collect_list.
df = spark.createDataFrame([
('emma', 'math'),
('emma', 'english'),
('mia','english'),
('mia','science'),
('mona','math'),
('mona','geography')
], ["student", "subject"])
from pyspark.sql.functions import collect_list
df1=df.groupBy('student').agg(collect_list('subject'))
df1.show()
B. If you want all values of c2 irrespective of any other column, you can group by a literal:
from pyspark.sql.functions import lit
df1=df.groupBy(lit(1)).agg(collect_list('subject'))
df1.show()
list, but something likepyspark.sql.types.ArrayType?