8

I've got the following simple mapping:

"element": {
  "dynamic": "false",
  "properties": {
    "id": { "type": "string", "index": "not_analyzed" },
    "group": { "type": "string", "index": "not_analyzed" },
    "type": { "type": "string", "index": "not_analyzed" }
  }
} 

Which basically is a way to store Group object:

{
  id : "...",
  elements : [
    {id: "...", type: "..."},
    ...
    {id: "...", type: "..."}
  ] 
}

I want to find how many different groups exist sharing the same set of element types (ordered, including repetitions).

An obvious solution would be to change the schema to:

"element": {
  "dynamic": "false",
  "properties": {
    "group": { "type": "string", "index": "not_analyzed" },
    "concatenated_list_of_types": { "type": "string", "index": "not_analyzed" }
  }
} 

But, due to the requirements, we need to be able to exclude some types from group by (aggregation) :(

All fields of the document are mongo ids, so in SQL I would do something like this:

SELECT COUNT(id), concat_value FROM (
    SELECT GROUP_CONCAT(type_id), group_id 
    FROM table
    WHERE type_id != 'some_filtered_out_type_id' 
    GROUP BY group_id
) T GROUP BY concat_value  

In Elastic with given mapping it's really easy to filter out, its also not a problem to count assuming we have a concated value. Needless to say, sum aggregation does not work for strings.

How can I get this working? :)

Thanks!

1 Answer 1

6

Finally I solved this problem with scripting and by changing the mapping.

{
  "mappings": {
    "group": {
      "dynamic": "false",
      "properties": {
        "id": { "type": "string", "index": "not_analyzed" },
        "elements": { "type": "string", "index": "not_analyzed" }
      }
    }
  }
}

There are still some issues with duplicate elements in array (ScriptDocValues.Strings) for some reason strips out dups, but here's an aggregation that counts by string concat:

{
  "aggs": {
    "path": {
      "scripted_metric": {
        "map_script": "key = doc['elements'].join('-'); _agg[key] = _agg[key] ? _agg[key] + 1 : 1",
        "combine_script": "_agg",
        "reduce_script": "_aggs.collectMany { it.entrySet() }.inject( [:] ) { result, e -> result << [ (e.key):e.value + ( result[ e.key ] ?: 0 ) ]}"
      }
    }
  }
}

The result would be as follows:

  "aggregations" : {
    "path" : {
      "value" : {
        "5639abfb5cba47087e8b457e" : 362,
        "568bfc495cba47fc308b4567" : 3695,
        "5666d9d65cba47701c413c53" : 14,
        "5639abfb5cba47087e8b4571-5639abfb5cba47087e8b457b" : 1,
        "570eb97abe529e83498b473d" : 1
      }
    }
  }
Sign up to request clarification or add additional context in comments.

1 Comment

Do you have any idea about this issue? It looks similar to yours. stackoverflow.com/questions/60650823/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.