I want to first filter only the rows which have Max and then i want to explode only rows that have Max in the nested column.
My Avro Record:
{
"name": "Parent",
"type":"record",
"fields":[
{"name": "firstname", "type": "string"},
{
"name":"children",
"type":{
"type": "array",
"items":{
"name":"child",
"type":"record",
"fields":[
{"name":"name", "type":"string"}
{"name":"price","type":["long", "null"]}
]
}
}
}
]
}
I am using Spark SQL context to query dataframe which is read. So if input is
Row no Firstname Children.name
1 John [[Max, 20],[Pg, 22]]
2 Bru [[huna, 10], [aman, 12]]
I query first by exploding inner table. So nested column split into 2 rows.
Row no Firstname Children.name children.price
1 John Max 20
1 John Pg 22
2 Bru huna 10
2 Bru aman 12
q1)I want to first filter only the rows which have Max and then i want to explode only rows that have Max in it. In the current situation, if i have million of values in one column, than it first generate the million rows, and then check if Max is present.
q2) I want to first filter only the rows which have price > 12 and then i want to explode only rows that have price > 12 in it. In the current situation, if i have million of values in one column, than it first generate the million rows, and then check if price > 12 is present.
Something like this: val results = sqlc.sql("SELECT firstname, child.name FROM parent LATERAL VIEW explode(children) childTable AS child where child.price > 12")