1

I need an aggregation query to get a bucket with all my root folders. All documents in my elasticsearch have a field named path where I store an array with the paths where the document is located ( e.g. path=[1.3., 1.2.4, 5., 11] ).

If I use the normal terms aggregation

"terms": {
    "field": "path.keyword"
}

I unfortunately get all unique paths:

"buckets" : [
    {
      "key" : "1.3."
      "doc_count" : 6
    },
    {
      "key" : "11."
      "doc_count" : 3
    },
    {
      "key" : "5."
      "doc_count" : 3
    },
    {
      "key" : "1.2.4."
      "doc_count" : 1
    }
]

I've tried to solve it using a painless script

"terms": {
    "script": "doc['path.keyword'].value.substring(0, doc['path.keyword'].value.indexOf('.')  )"
}

but then I only get the last elements of my path array

"buckets" : [
    {
      "key" : "1",
      "doc_count" : 7
    },
    {
      "key" : "11",
      "doc_count" : 3
    }
]

how do I only get the root folders?

1
  • can you add an example about how your output would look like? thanks! Commented Apr 8, 2020 at 15:28

2 Answers 2

3

Using doc["field"].value will give single string of all values in the field. In script you need to return array of values with root value i.e iterate through all the elements of field and return array of substring.

Sample Data:

"hits" : [
      {
        "_index" : "index84",
        "_type" : "_doc",
        "_id" : "yihhWnEBHtQEPt4DqWLz",
        "_score" : 1.0,
        "_source" : {
          "path" : [
            "1.1.1",
            "1.2",
            "2.1.1",
            "12.11"
          ]
        }
      }
    ]

Query

{
  "aggs": {
    "root_path": {
      "terms": {
        "script": {
          "source": "def firstIndex=0;def path=[]; for(int i=0;i<doc['path.keyword'].length;i++){firstIndex=doc['path.keyword'][i].indexOf('.'); path.add(doc['path.keyword'][i].substring(0,firstIndex))} return path;"
        }
      }
    }
  }
}

Result:

"aggregations" : {
    "root_path" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 1
        },
        {
          "key" : "12",
          "doc_count" : 1
        },
        {
          "key" : "2",
          "doc_count" : 1
        }
      ]
    }
  }
Sign up to request clarification or add additional context in comments.

Comments

0

There is a way to solve this problem with one-line scripting

Mapping

PUT /root_paths
{
    "settings": {
        "analysis": {
            "analyzer": {
                "pattern_first_token_analyzer": {
                    "tokenizer": "dot_split_tokenizer",
                    "filter": [
                        "first_token_filter"
                    ]
                }
            },
            "tokenizer": {
                "dot_split_tokenizer": {
                    "type": "pattern",
                    "pattern": "\\."
                }
            }, 
            "filter": {
                "first_token_filter": {
                    "type": "predicate_token_filter",
                    "script": {
                        "source": """
                            token.position == 0;
                        """
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "path": {
                "type": "text",
                "fields": {
                    "root": {
                        "type": "text",
                        "analyzer": "pattern_first_token_analyzer",
                        "fielddata": true
                    }
                }
            }
        }
    }
}

Documents

PUT /root_paths/_bulk
{"create":{"_id":1}}
{"path":"1.3.5"}
{"create":{"_id":2}}
{"path":"1.2"}
{"create":{"_id":3}}
{"path":"2.6.9"}
{"create":{"_id":4}}
{"path":"10.11.12"}

Aggregations query

GET /root_paths/_search?filter_path=aggregations
{
    "aggs": {
        "by_root_path": {
            "terms": {
                "field": "path.root"
            }
        }
    }
}

Response

{
    "aggregations" : {
        "by_root_path" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
                {
                    "key" : "1",
                    "doc_count" : 2
                },
                {
                    "key" : "10",
                    "doc_count" : 1
                },
                {
                    "key" : "2",
                    "doc_count" : 1
                }
            ]
        }
    }
}

Another way is using a runtime field to split path into a string array by dot and extract the first item of the array

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.