1

I am trying to query for all Users that have at least one color in common with a particular User and I have been able to do that however I am unable to figure out how to aggregate my results so that I can get a the user along with the colors that they have in common.

Part of my document for a sample user is as follows:

{
    // ... other fields
    "colors" : [
        {
            "id" : 1,
            "name" : "Green"
        },
        {
            "id" : 7,
            "name" : "Blue"
        }
    ]
}

This is my query for getting the colors in common with another User that has the colors Red, Orange and Green:

{
  "query": {
    "nested": {
      "path": "colors",
      "scoreMode": "sum",
      "query": {
        "function_score": {
          "filter": {
            "terms": {
              "colors.name": [
                "Red","Orange","Green"
              ]
            }
          },
          "functions": [
            // Functions here for custom scoring
          ]
        }
      }
    }
  }
}

How can I aggregate the Users with the colors in common?

5
  • If I understand your question correctly and using the example you've provided, do you want the list of users for each of the following colors: Red, Orange and Green? Commented Jan 16, 2016 at 3:04
  • No. I'll explain it again. Lets say I have a User A with me and he has colors Red, Orange and Green. Now I am searching for all users that have at least one of those colors. Then I want to aggregate the users with the colors in common with User A. So if User X had colors Blue, Green and Yellow and User Y had colors Blue, Red and Orange then I want to get back User X - [Green] and User Y - [Red, Orange]. Does that make sense? I'm basically trying to get the colors in common between the current user and all users in my database. Commented Jan 16, 2016 at 3:10
  • OK I understand your question now. Commented Jan 16, 2016 at 3:22
  • Can I assume your document has a field called say "user_id" along with the field "colors"? Commented Jan 16, 2016 at 3:23
  • Yes that's fine. It does have a user_id field. Commented Jan 16, 2016 at 3:30

2 Answers 2

1

You need to use nested aggregation, then apply filter aggregation for colors and finally use top hits to get the matching colors. I am using source filtering to get only color value

This is the query

{
  "size": 0,
  "query": {
    "nested": {
      "path": "colors",
      "query": {
        "terms": {
          "colors.color": [
            "green",
            "red"
          ]
        }
      }
    }
  },
  "aggs": {
    "user": {
      "terms": {            <----get users with unique name or user_id
        "field": "name",
        "size": 10
      },
      "aggs": {
        "nested_color_path": {  <---go inside nested documents
          "nested": {
            "path": "colors"
          },
          "aggs": {
            "match_color": {
              "filter": {         <--- use the filter to match for colors
                "terms": {
                  "colors.color": [
                    "green",
                    "red"
                  ]
                }
              },
              "aggs": {
                "get_match_color": {  <--- use this to get matched color
                  "top_hits": {
                    "size": 10,
                     "_source": {
                       "include": "name"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

How can I do that for multivalue bucket ? Seperate question - stackoverflow.com/questions/43451667/…
0

You have to use nested aggregations to achieve this. See the query below:

POST <index>/<type>/_search
{
   "query": {
      "nested": {
         "path": "colors",
         "query": {
            "terms": {
               "colors.name": [
                  "Red",
                  "Orange",
                  "Green"
               ]
            }
         }
      }
   },
   "aggs": {
      "users_with_common_colors": {
         "terms": {
            "field": "user_id",
            "size": 0,
            "order": {
                "color_distribution>common": "desc"  <-- This will sort the users in descending order of number of common colors
            }
         },
         "aggs": {
            "color_distribution": {
               "nested": {
                  "path": "colors"
               },
               "aggs": {
                  "common": {
                     "filter": {
                        "terms": {
                           "colors.name": [
                              "Red",
                              "Orange",
                              "Green"
                           ]
                        }
                     },
                     "aggs": {
                        "colors": {
                           "terms": {
                              "field": "colors.name",
                              "size": 0
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

5 Comments

Thanks, this is exactly what I needed! I was wondering if it would be possible to sort the aggregations by the number of common colors? I tried adding an order but ran into errors since nested is a multivalue bucket.
Yes you can. See my updated answer where I'm sorting the users in descending order of the number of common colors. You can also sort them in ascending order if you wish. Read more at elastic.co/guide/en/elasticsearch/reference/current/….
I did try that but I got the following error: "Invalid terms aggregation order path [color_distribution>common]. Terms buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end. Sub-path [color_distribution] points to non single-bucket aggregation"
Nvm, I figured out how to do it using top_hits. Thanks for your help!
@bittusarkar - How can I do that for multivalue bucket ? Seperate question - stackoverflow.com/questions/43451667/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.