PySpark DataFrames: filter where some value is in array column

Question

I have a DataFrame in PySpark that has a nested array value for one of its fields. I would like to filter the DataFrame where the array contains a certain string. I'm not seeing how I can do that.

The schema looks like this: root |-- name: string (nullable = true) |-- lastName: array (nullable = true) | |-- element: string (containsNull = false)

I want to return all the rows where the upper(name) == 'JOHN' and where the lastName column (the array) contains 'SMITH' and the equality there should be case insensitive (like I did for the name). I found the isin() function on a column value, but that seems to work backwards of what I want. It seem like I need a contains() function on a column value. Anyone have any ideas for a straightforward way to do this?

shuaiyuancn · Accepted Answer · 2016-06-27 09:15:18Z

3

You could consider working on the underlying RDD directly.

def my_filter(row):
    if row.name.upper() == 'JOHN':
        for it in row.lastName:
            if it.upper() == 'SMITH':
                yield row

dataframe = dataframe.rdd.flatMap(my_filter).toDF()

answered Jun 27, 2016 at 9:15

shuaiyuancn

2,7943 gold badges25 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MoonKnight · Accepted Answer · 2019-07-25 18:33:48Z

1

An update in 2019

spark 2.4.0 introduced new functions like array_contains and transform official document now it can be done in sql language

For your problem, it should be

dataframe.filter('array_contains(transform(lastName, x -> upper(x)), "JOHN")')

It is better than the previous solution using RDD as a bridge, because DataFrame operations are much faster than RDD ones.

answered Jul 25, 2019 at 18:33

MoonKnight

1363 bronze badges

Collectives™ on Stack Overflow

PySpark DataFrames: filter where some value is in array column

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related