1

I have a Spark (Python) dataframe with two columns: a user ID and then an array of arrays, which is represented in Spark as a wrapped array like so:

[WrappedArray(9, 10, 11, 12), WrappedArray(20, 21, 22, 23, 24, 25, 26)] 

In its usual representation this would look like this:

[[9, 10, 11, 12], [20, 21, 22, 23, 24, 25, 26]]

I want to perform operations on each of the subarrays, for example take a third list and check whether any of its values is in the first sub-array, but I can't seem to find solutions for pyspark 2.0 (only Scala-specific older solutions like this and this).

How does one access (and in general work with) wrapped arrays? What is an efficient way to do what I described above?

1 Answer 1

1

You can treat each wrapped array as individual list . in your example, if you want to which elements from 2nd wrapped array is present in first array, you could do something like -

# Prepare data 
data = [[10001,[9, 10, 11, 12],[20, 10, 9, 23, 24, 25, 26]],
        [10002,[8, 1, 2, 3],[49, 3, 6, 5, 6]],
       ]
rdd = sc.parallelize(data) 

df = rdd.map( 
        lambda row : row+[
                          [x for x in row[2] if x in row[1]]
                         ]
           ).toDF(["userID","array1","array2","commonElements"])

df.show()

output :

+------+---------------+--------------------+--------------+
|userID|         array1|              array2|commonElements|
+------+---------------+--------------------+--------------+
| 10001|[9, 10, 11, 12]|[20, 10, 9, 23, 2...|       [10, 9]|
| 10002|   [8, 1, 2, 3]|    [49, 3, 6, 5, 6]|           [3]|
+------+---------------+--------------------+--------------+
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, what would be a solution using dataframes instead of RDD's?
maybe something like .getItem(num) which gets you the item from the list if column is list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.