I have a Spark (Python) dataframe with two columns: a user ID and then an array of arrays, which is represented in Spark as a wrapped array like so:
[WrappedArray(9, 10, 11, 12), WrappedArray(20, 21, 22, 23, 24, 25, 26)]
In its usual representation this would look like this:
[[9, 10, 11, 12], [20, 21, 22, 23, 24, 25, 26]]
I want to perform operations on each of the subarrays, for example take a third list and check whether any of its values is in the first sub-array, but I can't seem to find solutions for pyspark 2.0 (only Scala-specific older solutions like this and this).
How does one access (and in general work with) wrapped arrays? What is an efficient way to do what I described above?