I am using apache spark 1.5 dataframe with elasticsearch, I am try to filter id from a column that contains a list(array) of ids.
For example the mapping of elasticsearch column is looks like this:
{
"people":{
"properties":{
"artist":{
"properties":{
"id":{
"index":"not_analyzed",
"type":"string"
},
"name":{
"type":"string",
"index":"not_analyzed",
}
}
}
}
}
The example data format will be like following
{
"people": {
"artist": {
[
{
"id": "153",
"name": "Tom"
},
{
"id": "15389",
"name": "Cok"
}
]
}
}
},
{
"people": {
"artist": {
[
{
"id": "369",
"name": "Carl"
},
{
"id": "15389",
"name": "Cok"
},
{
"id": "698",
"name": "Sol"
}
]
}
}
}
In spark I try this:
val peopleId = 152
val dataFrame = sqlContext.read
.format("org.elasticsearch.spark.sql")
.load("index/type")
dataFrame.filter(dataFrame("people.artist.id").contains(peopleId))
.select("people_sequence.artist.id")
I got all the id that is contains 152, for example 1523 , 152978 but not only id == 152
Then I tried
dataFrame.filter(dataFrame("people.artist.id").equalTo(peopleId))
.select("people.artist.id")
I get empty, I understand why, it's because I have array of people.artist.id
Can anyone tell me how to filter when I have list of ids ?