1

I have an ElasticSearch query that looks like this:

{
  "query": {
        "query_string": {
                "query": "Lorem*",
                "fields": ["search_names", "name^2"]
        }
    }
}

Against documents that look like this.


{
        "member_name" : "Lorem Ipsum",
        "complaint_periods" : [
            {
                "period": "01/01/2001 - 31/12/2001",
                "complaints": "10"
            },
            {
                "period": "01/01/2002 - 31/12/2002",
                "complaints": "0"
            },
            {
                "period": "01/01/2003 - 31/12/2003",
                "complaints": "3"
            },
            {
                "period": "01/01/2004 - 31/12/2004",
                "complaints": "100"
            }
         ],
        "search_names" : [
            "Lorem Ipsum",
            "dolor sit amet",
            "varius augue",
            "Aliquam fringilla"
        ]
}

So I'm able to retrieve documents based on how close their name, and search names are to my query.

The requirement is, a text search box should retrieve the closest name match to the query, however, given relatively similar names, a document with a number of complaints above a threshold of 10 in a passed time period, should appear higher in the search results than those with less than 10.

So I need to pass a key for the time period, e.g. "01/01/2001 - 31/12/2001", and boost the documents score if the complaint value for that period is > 10.

Current index mapping looks like this.

"mappings": {
    "properties": {
        "member_name": {
            "type": "text"
        },
        "search_names": {
            "type": "text"
        },
        "complaint_periods": {
            "type": "nested",
            "properties": {
                "period": {
                    "type": "text",
                },
                "complaints": {
                    "type": "integer"
                }
            }
        }
    }
}

I'm currently reading into Nested queries as a possible solution...but I'm fairly fresh to ES so keen to get opinions on the types of queries/structure I should be using to achieve this.

Any advice?

Thank you.

7
  • can you provide your index mapping and anything which you have tried on which we can build solution Commented Feb 24, 2020 at 1:51
  • Updated my question with the index mapping; I'm reading into 'Nested queries' but at present I don't have a partial solution. Looking for advice on what kinds of queries to use and how the data should be structured/indexed based on the the output requirements and what the data looks like. Commented Feb 24, 2020 at 2:09
  • look like you provided partial mapping, I can't see the mapping for date and complaints fields, also it would be great if you can share a postman collection with all the required API to quickly reprouce your issue Commented Feb 24, 2020 at 2:13
  • Apologies; my index mapping was actually not valid. ES still returned results for me so didn't seem to be a problem. I've updated it again. Appreciate the effort to investigate but at this stage I wouldn't really be comfortable to share access to the API with third parties. Commented Feb 24, 2020 at 2:48
  • we don't need access to your APIs, we just need minimum information for us to debug issue :-) Commented Feb 24, 2020 at 3:37

1 Answer 1

1

So it seems I was able to solve this with the following query:


"query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "Lorem*",
          "fields": ["search_names", "member_name^2"]
        }
      },
      "should": {
        "nested" : {
            "path" : "complaint_periods",
            "query" : {
                "bool" : {
                    "should" : [
                      { "term" : {"complaint_periods.period" : "01/01/2001 - 31/12/2001"} }
                    ]
                }
            }
        }
      }
    }
  }

I've switched over to using a boolean query since according to the docs

A query that matches documents matching boolean combinations of other queries

So as I understand this, the first part of my query indicates that the result "must" contain a string match against my query in one of 2 fields.

The second part is a nested query. While my data appears to be a date, its actually stored and queried like a category, so I switched the complaint_period type over to a 'keyword' type instead of 'text'. This allows me to use it in a 'term' query (exact text match, categorical).

Since the nested query is 'should' the result does not HAVE to match, but if it does it should boost the score and push it further up the list of results.

The docs on nested queries also have examples that would allow me to boost based on the number of complaints e.g:

{ "range" : {"complaint_periods.complaints" : {"gt" : 5}} }

Which I may need to add later on.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.