1

Elasticsearch v7.0

Hello and good day!

I'm trying to create a query that will have a condition: if a nested field has only 1 element, get that first element, if a nested field has 2 more or elements, get a matching nested field condition

Scenario:

I have an index named socialmedia and has a nested field named cms which places a sentiment for that document

An example document of the cms field looks like this

"_id" : 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    }
]

This cms field contains "cli_id" : 0 by default for its 1st element (this means it is for all the clients/users to see) but sooner or later, it goes like this:

"_id": 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    },
    {
      "cli_id" : 1,
      "cmx_sentiment" : "Positive"
    },
    {
      "cli_id" : 2,
      "cmx_sentiment" : "Neutral"
    },
]

The 2nd and 3rd element shows that the clients with cli_id equals to 1 and 2 has made a sentiment for that document.

Now, I want to formulate a query that if the client who logged in has no sentiment yet for a specific document, it fetches the cmx_sentiment that has the "cli_id" : 0

BUT , if the client who has logged in has a sentiment for the fetched documents according to his filters, the query will fetch the cmx_sentiment that has the matching cli_id of the logged in client

for example: the client who has a cli_id of 2, will get the cmx_sentiment of **Neutral** according to the given document above

the client who has a cli_id of 5, will get the cmx_sentiment of **Negative** because he hasn't given a sentiment to the document

PSEUDO CODE :

If a document has a sentiment indicated by the client, get the cmx_sentiment of the cli_id == to the client's ID

if a document is fresh or the client HAS NOT labeled yet a sentiment on that document, get the element's cmx_sentiment that has cli_id == 0

I'm in need of a query to condition for the pseudo code above

Here's my sample query:

"aggs" => [
    "CMS" => [
        "nested" => [
            "path" => "cms",
        ],
        "aggs" => [
            "FILTER" => [
                "filter" => [
                    "bool" => [
                        "should" => [
                            [
                                "match" => [
                                    "cms.cli_id" => 0
                                ]
                            ],
                            [
                                "bool" => [
                                    "must" => [
                                        [
                                            // I'm planing to create a bool method here to test if cli_id is equalis to the logged-in client's ID
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                "aggs"=> [
                    "TONALITY"=> [
                        "terms"=> [
                            "field" => "cms.cmx_sentiment"
                        ],
                    ]
                ]
            ]
        ]
    ]
]

Is my query correct?

The problem with the query I have provided, is that it SUMS all the elements, instead of picking one only

The query above provides this scenario:

The client with cli_id 2 logs in

Both the Neutral and Negative cmx_sentiment are being retrieved, instead of the Neutral alone

1 Answer 1

1

After the discussion with OP I'm rewriting this answer.

To get the desired result you will have to consider the following to build the query and aggregation:

Query:

This will contain any filter applied by logged in user. For the example purpose I'm using match_all since every document has atleast one nested doc against cms field i.e. for cli_id: 0

Aggregation:

Here we have to divide the aggregations into two:

  1. default_only
  2. sentiment_only
default_only

In this aggregation we find count for those document which don't have nested document for cli_id: <logged in client id>. i.e. only those docs which have nested doc for cli_id: 0. To do this we follow the steps below:

  1. default_only Use filter aggregation to get document which does not have nested document for cli_id: <logged in client id> i.e. using must_not => cli_id: <logged in client id>
  2. default_nested : Add sub aggregation for nested docs since we need to get the docs against sentiment which is field of nested document.
  3. sentiment_for_cli_id : Add sub aggregation to default_nested aggregation in order to get sentiment only for default client i.e. for cli_id: 0.
  4. default : Add this terms sub aggregation to sentiment_for_cli_id aggregation to get counts against the sentiment. Note that this count is of nested docs and since you always have only one nested doc per cli_id therefore this count seems to be the count of docs but it is not.
  5. the_doc_count: Add this reverse_nested aggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub aggregation of default aggregation.
sentiment_only

This aggregation give count against each sentiment where cli_id: <logged in client id> is present. For this we follow the same approach as we followed for default_only aggregation. But with some tweaks as below:

  1. sentiment_only : must => cli_id: <logged in client id>
  2. sentiment_nested : same reason as above
  3. sentiment_for_cli_id: same but instead of default we filter for cli_id: <logged in client id>
  4. sentiment: same as default
  5. the_doc_count: same as above

Example:

PUT socialmedia/_bulk
{"index":{"_id": 1}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"}]}
{"index":{"_id": 2}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
{"index":{"_id": 3}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Negative"}]}
{"index":{"_id": 4}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
Query:
GET socialmedia/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "default_only": {
      "filter": {
        "bool": {
          "must_not": [
            {
              "nested": {
                "path": "cms",
                "query": {
                  "term": {
                    "cms.cli_id": 2
                  }
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "default_nested": {
          "nested": {
            "path": "cms"
          },
          "aggs": {
            "sentiment_for_cli_id": {
              "filter": {
                "term": {
                  "cms.cli_id": 0
                }
              },
              "aggs": {
                "default": {
                  "terms": {
                    "field": "cms.cmx_sentiment"
                  },
                  "aggs": {
                    "the_doc_count": {
                      "reverse_nested": {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "sentiment_only": {
      "filter": {
        "bool": {
          "must": [
            {
              "nested": {
                "path": "cms",
                "query": {
                  "term": {
                    "cms.cli_id": 2
                  }
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "sentiment_nested": {
          "nested": {
            "path": "cms"
          },
          "aggs": {
            "sentiment_for_cli_id": {
              "filter": {
                "term": {
                  "cms.cli_id": 2
                }
              },
              "aggs": {
                "sentiment": {
                  "terms": {
                    "field": "cms.cmx_sentiment"
                  },
                  "aggs": {
                    "the_doc_count": {
                      "reverse_nested": {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
Agg Output:
 "aggregations" : {
    "default_only" : {
      "doc_count" : 1,
      "default_nested" : {
        "doc_count" : 1,
        "sentiment_for_cli_id" : {
          "doc_count" : 1,
          "default" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Positive",
                "doc_count" : 1,
                "the_doc_count" : {
                  "doc_count" : 1
                }
              }
            ]
          }
        }
      }
    },
    "sentiment_only" : {
      "doc_count" : 3,
      "sentiment_nested" : {
        "doc_count" : 6,
        "sentiment_for_cli_id" : {
          "doc_count" : 3,
          "sentiment" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Neutral",
                "doc_count" : 2,
                "the_doc_count" : {
                  "doc_count" : 2
                }
              },
              {
                "key" : "Negative",
                "doc_count" : 1,
                "the_doc_count" : {
                  "doc_count" : 1
                }
              }
            ]
          }
        }
      }
    }
  }
Sign up to request clarification or add additional context in comments.

9 Comments

Hello sir, thank you for your answer! However, I'm in need of aggs coz im formulating a report. I'm not particularly aiming for the documents per se, but for the doc_count so I could display as a chart in my application
You actually want the count or you want the sentiment value for each doc after some filters are applied or for all the sentiments you want the doc count?
only the count sir @OpsterESNinjaNishant , 'coz my original goal for this app is to generate a report
additionally, according to your answer, I should not be able to get the Negative sentiment coz client 2 has already made a sentiment to that document. Instead, he should be getting only the Neutral sentiment
@Suomynona My question is this: If there are 4 docs in total. doc 1 with cms having no nested doc with cli_id: <logged in cli_id> and have sentiment positive for cli_id: 0, doc 2 have cli_id: <logged in cli_id> with sentiment value as neutral, doc 3 has sentiment as negative and doc 4 neutral forcli_id: <logged in cli_id>. Now what count you want: {default: 1, with_sentiment : 3} OR {"positive": 1, "neutral": 2, "negative" 1} ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.