0

Below are two mocked records from my elasticsearch index. I have millions of records in my ES. I am trying to query ES to get all the records that have non-empty/ non-null "tags" field. If a record doesn't have a tag ( like second record below) then I don't want to pull it from ES.

If "books" were not nested then googling around seems like the below query would have worked -

curl -XGET 'host:port/book_indx/book/_search?' -d '{
    "query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source"}}}}
}'

However I am not finding a solution to query the nested structure. I tried the below with no luck -

{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source.tags"}}}}}

{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source":{"tags"}}}}}}

Any suggestions are really appreciated here! Thanks in advance.

{
"_shards": {
    "failed": 0,
    "successful": 12,
    "total": 12
},
"hits": {
    "hits": [
        {
            "_id": "book1",
            "_index": "book",
            "_source": {
                "book_name": "How to Get Organized",
                "publication_date": "2014-02-24T16:50:39+0000",
                "tags": [
                    {
                        "category": "self help",
                        "topics": [
                            {
                                "name": "time management",
                                "page": 6198
                            },
                            {
                                "name": "calendar",
                                "page": 10
                            }
                        ],
                        "id": "WEONWOIR234LI",
                    }
                ],
                "last_updated": "2015-11-11T16:28:32.308+0000"
            },
            "_type": "book"
        },
        {
            "_id": "book2",
            "_index": "book",
            "_source": {
                "book_name": "How to Cook",
                "publication_date": "2014-02-24T16:50:39+0000",
                "tags": [],
                "last_updated": "2015-11-11T16:28:32.308+0000"
            },
            "_type": "book"
        }
    ],
    "total": 1
},
"timed_out": false,
"took": 80

}

Mapping -

        "book": {
            "_id": {
                "path": "message_id"
            },
            "properties": {
                "book_name": {
                    "index": "not_analyzed",
                    "type": "string"
                },
                "publication_date": {
                    "format": "date_time||date_time_no_millis",
                    "type": "date"
                },
                "tags": {
                    "properties": {
                        "category": {
                            "index": "not_analyzed",
                            "type": "string"
                        },
                        "topic": {
                            "properties": {
                                "name": {
                                    "index": "not_analyzed",
                                    "type": "string"
                                },
                                "page": {
                                    "index": "no",
                                    "type": "integer"
                                }                     
                            }
                        },
                        "id": {
                            "index": "not_analyzed",
                            "type": "string"
                        }
                    },
                    "type": "nested"
                },
                "last_updated": {
                    "format": "date_time||date_time_no_millis",
                    "type": "date"
                }
            }
        }   
2
  • 2
    Can you also share your mapping for the book type? Is the tags field a nested field or a normal object field? I'm also surprised to not see the _source in your documents. Commented Dec 5, 2015 at 5:09
  • @Val thanks for pointing out _source was missing - I accidently renamed it. Made the updates above and included the mapping file Commented Dec 5, 2015 at 16:12

1 Answer 1

1

Since your tags field has a nested type, you need to use a nested filter in order to query it.

The following filtered query will correctly return only the first document above (i.e. with id book1)

{
  "query": {
    "filtered": {
      "filter": {
        "nested": {
          "path": "tags",
          "filter": {
            "exists": {
              "field": "tags"
            }
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the suggestion. It worked great! (Apologies for the delayed response)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.