I have:
+-----------------------+-------+------------------------------------+
|cities |name |schools |
+-----------------------+-------+------------------------------------+
|[palo alto, menlo park]|Michael|[[stanford, 2010], [berkeley, 2012]]|
|[santa cruz] |Andy |[[ucsb, 2011]] |
|[portland] |Justin |[[berkeley, 2014]] |
+-----------------------+-------+------------------------------------+
I get this no sweat:
val res = df.select ("*").where (array_contains (df("schools.sname"), "berkeley")).show(false)
But without wanting to explode or using an UDF, I in the same way or similar as above, how can I do something like:
return all rows where at least 1 schools.sname starts with "b" ?
e.g.:
val res = df.select ("*").where (startsWith (df("schools.sname"), "b")).show(false)
This is wrong of course, just to demonstrate the point. But how can I do something like this without exploding or UDF-usage returning true/false or whatever and filtering in general without UDF usage? May be it is not possible. I cannot find any such examples. Or is it expr I need?
Answers gotten which show how certain things have a certain approach as some capabilities do not exist in SCALA. I read an article that points out to new array features to be implemented after this, so proves a point.