1

I have a column of ArrayType in Pyspark. I want to filter only the values in the Array for every Row (I don't want to filter out actual rows!) without using UDF.

For instance given this dataset with column A of ArrayType:

|     A      |
______________
|[-2, 1, 7]  |
|[1]         |
|[-4, -1, -3]|

And I would like to have only positive values the output would be:

|     A      |
______________
|[1, 7]      |
|[1]         |
|[]          |

1 Answer 1

6

For Spark 2.4 and above,

from pyspark.sql.functions import expr

df.withColumn("A", expr("filter(A, x -> x > 0)")).show()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.