NOTE: I'm working in Spark 2.4.4
I have the following dataset
col1
['{"key1": "val1"}','{"key2": "val2"}']
['{"key1": "val1"}','{"key2": "val3"}']
Essentially, I'd like to filter out any rows where key2 is not val2.
col1
['{"key1": "val1"}','{"key2": "val2"}']
In trino SQL, I'm doing it like this:
any_match(col1, x -> json_extract_scalar(x, '$.key2') = 'val2')
But this isn't available in Spark 2.4
My only idea is to explode and then use the following code which isn't efficient.
df.filter(F.get_json_object(F.col("col1"), '$.key2') == 'val2')
I'm wondering if I can do this without exploding in my version of spark (2.4.4)