2

Basically, what I'm trying to do here is get the second-level-down categories from a hierarchically stored string. The problem is that the level of hierarchy vary and one product category could have six levels and another only four, otherwise I would have just implemented predefined levels.

I have some products with categories like so:

[
  {
    title: 'product one',
    categories: [
      'clothing/mens/shoes/boots/steel-toe'
    ]
  },
  {
    title: 'product two',
    categories: [
      'clothing/womens/tops/sweaters/open-neck'
    ]
  },
  {
    title: 'product three',
    categories: [
      'clothing/kids/shoes/sneakers/light-up'
    ]
  },
  {
    title: 'product etc.',
    categories: [
      'clothing/baby/bibs/super-hero'
    ]
  }, 
  ... more products
]

I'm trying to get aggregation buckets like so:

buckets: [
  {
    key: 'clothing/mens',
    ...
  },
  {
    key: 'clothing/womens',
    ...
  },
  {
    key: 'clothing/kids',
    ...
  },
  {
    key: 'clothing/baby',
    ...
  },
]

I've tried looking at filter prefixes, includes and excludes on terms, but I can't find anything that works. Please someone point me in the right direction.

1 Answer 1

3

Your category field should be analyzed with a custom analyzer. Maybe you have some other plans with the category, so I'll just add a subfield used only for aggregations:

{
  "settings": {
    "analysis": {
      "filter": {
        "category_trimming": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "(^\\w+\/\\w+)"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "category_trimming",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "category": {
          "type": "string",
          "fields": {
            "just_for_aggregations": {
              "type": "string",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

Test data:

POST /index/test/_bulk
{"index":{}}
{"category": "clothing/womens/tops/sweaters/open-neck"}
{"index":{}}
{"category": "clothing/mens/shoes/boots/steel-toe"}
{"index":{}}
{"category": "clothing/kids/shoes/sneakers/light-up"}
{"index":{}}
{"category": "clothing/baby/bibs/super-hero"}

The query itself:

GET /index/test/_search?search_type=count
{
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category.just_for_aggregations",
        "size": 10
      }
    }
  }
}

The results:

   "aggregations": {
      "by_category": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "clothing/baby",
               "doc_count": 1
            },
            {
               "key": "clothing/kids",
               "doc_count": 1
            },
            {
               "key": "clothing/mens",
               "doc_count": 1
            },
            {
               "key": "clothing/womens",
               "doc_count": 1
            }
         ]
      }
   }
Sign up to request clarification or add additional context in comments.

3 Comments

I was just looking at those and thought there might be an easier way, but that should work. Thank you sir!
Thanks for the answer Andrei, on second thought. It looks like the pattern will only go two layers deep. Is there a way to make it so I can aggregate any layer deep? You see, in one situation I might need to get only 'level1/level2' deep and on another I might need 'level1/level2/level3' or even 'level1/level2/level3/level4' deep.
If you want any (and all) "paths" then take a look at the path hierarchy tokenizer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.