1

I have many indexed documents such as this one:

{
   "_index":"myindex",
   "_type":"somedata",
   "_id":"31d3255d-67b4-40e6-b9d4-637383eb72ad",
   "_version":1,
   "_score":1,
   "_source":{
      "otherID":"b4c95332-daed-49ae-99fe-c32482696d1c",
      "data":[
         {
            "data":"d2454d41-a74e-43af-b3b0-0febeaf67a99",
            "iD":"9362f2eb-9bd7-4924-8b0e-77c27bb0aa56"
         },
         {
            "data":"some text",
            "iD":"c554b8ce-c873-4fef-b306-ec65d2f40394"
         },
         {
            "data":"5256983c-ef69-4363-9787-97074297c646",
            "iD":"8c90e2be-6042-4450-b0fd-0732900f8f65"
         },
         {
            "data":"other text",
            "iD":"8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
         },
         {
            "data":"3",
            "iD":"c880bfdf-eb4b-4c80-9871-fd44e06b2ed2"
         }
      ],
      "iD":"31d3255d-67b4-40e6-b9d4-637383eb72ad"
   }
}  

It's type mapping is configured this way:

{
   "somedata":{
      "dynamic_templates":[
         {
            "defaultIDs":{
               "match_pattern":"regex",
               "mapping":{
                  "index":"not_analyzed",
                  "type":"string"
               },
               "match":".*(id|ID|iD)"
            }
         }
      ],
      "properties":{
         "otherID":{
            "index":"not_analyzed",
            "type":"string"
         },
         "data":{
            "properties":{
               "data":{
                  "type":"string"
               },
               "iD":{
                  "index":"not_analyzed",
                  "type":"string"
               }
            }
         },
         "iD":{
            "index":"not_analyzed",
            "type":"string"
         }
      }
   }
}  

I wish to be able to retrieve a string concatenation of data based on it's ID.
For example, given the id c554b8ce-c873-4fef-b306-ec65d2f40394, and the id 8d8f8a61-02d6-4d3e-9912-9ebb5d213c15, I would like to retrieve some text other text.
These IDs repeat in other documents of the same type with different data.

If this is not possible (which I suspect this is the case), I would like to at least retrieve a partial array containing my requested data.
Those arrays can become large (and so is the number of documents) and I would only need one or two elements from each hit.

If both my requests are not possible, how would you suggest changing my mappings in order to facilitate my needs?

Thanks in advance, Jonathan.

2 Answers 2

3

I have found a way to do exactly what I needed without changing my data structure.
(I actually did end up changing my data structure, but for reasons of space and efficiency).

All you have to do is enjoy the groovy goodness ElasticSearch has to offer:

{
    "query" : { "term" : { "otherID" : "b4c95332-daed-49ae-99fe-c32482696d1c" } },
    "script_fields" : { "requestedFields" : { "script" :  "_source.data.findAll({ it.iD == 'c554b8ce-c873-4fef-b306-ec65d2f40394' || it.iD == '8d8f8a61-02d6-4d3e-9912-9ebb5d213c15'}) data.join(' ') " } }
}

Just goes to show how strong ElasticSearch really is.

Sign up to request clarification or add additional context in comments.

Comments

1

I cannot help you with the field concatenation (maybe it's possible with scripting but I'm not experienced enough with it. I would assume a new field would have to be generated, etc.) but how to only retrieve the partial data.

It requires at least ES 1.5 because it uses inner_hits and you need to change the mapping.

I added type and include_in_parent to your data type:

DELETE somedata
PUT somedata
PUT somedata/sometype/_mapping
{
   "sometype":{
      "dynamic_templates":[
         {
            "defaultIDs":{
               "match_pattern":"regex",
               "mapping":{
                  "index":"not_analyzed",
                  "type":"string"
               },
               "match":".*(id|ID|iD)"
            }
         }
      ],
      "properties":{
         "otherID":{
            "index":"not_analyzed",
            "type":"string"
         },
         "data":{
            "type": "nested",
            "include_in_parent": true,
            "properties":{
               "data":{
                  "type":"string"
               },
               "iD":{
                  "index":"not_analyzed",
                  "type":"string"
               }
            }
         },
         "iD":{
            "index":"not_analyzed",
            "type":"string"
         }
      }
   }
}  

Now indexing your document:

PUT somedata/sometype/1
{
      "otherID":"b4c95332-daed-49ae-99fe-c32482696d1c",
      "data":[
         {
            "data":"d2454d41-a74e-43af-b3b0-0febeaf67a99",
            "iD":"9362f2eb-9bd7-4924-8b0e-77c27bb0aa56"
         },
         {
            "data":"some text",
            "iD":"c554b8ce-c873-4fef-b306-ec65d2f40394"
         },
         {
            "data":"5256983c-ef69-4363-9787-97074297c646",
            "iD":"8c90e2be-6042-4450-b0fd-0732900f8f65"
         },
         {
            "data":"other text",
            "iD":"8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
         },
         {
            "data":"3",
            "iD":"c880bfdf-eb4b-4c80-9871-fd44e06b2ed2"
         }
      ],
      "iD":"31d3255d-67b4-40e6-b9d4-637383eb72ad"
   }

And here's how you can match and retrieve with inner_hits:

POST somedata/sometype/_search
{
  "query": {
    "nested": {
      "path": "data",
      "query": {
        "bool": {
          "should": [
            {
            "term": {
              "data.iD": "c554b8ce-c873-4fef-b306-ec65d2f40394"
            }
            },
            {
            "term": {
              "data.iD": "8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
            }
            }
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

In the result now look at this path: hits.hits[0].inner_hits.data.hits.hits[0]._source.data; it only contains your two requested matches:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.5986179,
      "hits": [
         {
            "_index": "somedata",
            "_type": "sometype",
            "_id": "1",
            "_score": 0.5986179,
            "_source": {
               "otherID": "b4c95332-daed-49ae-99fe-c32482696d1c",
               "data": [
                  {
                     "data": "d2454d41-a74e-43af-b3b0-0febeaf67a99",
                     "iD": "9362f2eb-9bd7-4924-8b0e-77c27bb0aa56"
                  },
                  {
                     "data": "some text",
                     "iD": "c554b8ce-c873-4fef-b306-ec65d2f40394"
                  },
                  {
                     "data": "5256983c-ef69-4363-9787-97074297c646",
                     "iD": "8c90e2be-6042-4450-b0fd-0732900f8f65"
                  },
                  {
                     "data": "other text",
                     "iD": "8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
                  },
                  {
                     "data": "3",
                     "iD": "c880bfdf-eb4b-4c80-9871-fd44e06b2ed2"
                  }
               ],
               "iD": "31d3255d-67b4-40e6-b9d4-637383eb72ad"
            },
            "inner_hits": {
               "data": {
                  "hits": {
                     "total": 2,
                     "max_score": 0.5986179,
                     "hits": [
                        {
                           "_index": "somedata",
                           "_type": "sometype",
                           "_id": "1",
                           "_nested": {
                              "field": "data",
                              "offset": 3
                           },
                           "_score": 0.5986179,
                           "_source": {
                              "data": "other text",
                              "iD": "8d8f8a61-02d6-4d3e-9912-9ebb5d213c15"
                           }
                        },
                        {
                           "_index": "somedata",
                           "_type": "sometype",
                           "_id": "1",
                           "_nested": {
                              "field": "data",
                              "offset": 1
                           },
                           "_score": 0.5986179,
                           "_source": {
                              "data": "some text",
                              "iD": "c554b8ce-c873-4fef-b306-ec65d2f40394"
                           }
                        }
                     ]
                  }
               }
            }
         }
      ]
   }
}

Now, inner_hits is fairly new and the documentation also states:

Warning: This functionality is experimental and may be changed or removed completely in a future release.

YMMV.

Another thing to watch out: the inner_hits are sorted by score. In your original document they're in an array which is ordered but that information is lost in the actual result. If you require to have them in the same order in the inner_hits, I think you need to add a separate field for sorting (could just be the array index...) and sort the inner_hits by it.

6 Comments

Thank you for taking the time to respond @mark. You certainly pointed me in the right direction. It's a bit discouraging to see that warning though. And it seems getting that concatenation would be so complex, if it's even possible, that it's not worth it.. sigh If no other surprising answer comes along in the next hours I'll accept yours. Thanks again :-)
Without knowing your full intent with the data structure, maybe it makes sense to store it in a different way into ES, optimized for your case I.e. primarily indexing the data[] array as type and attaching iD and orderID to it and get away with a query without inner_hits (still, not solving the sorting and concatenation).
Thanks @mark. I realized my data structure needed a change for many reasons. Nevertheless I still need that concatenation ability and it turns out there are many ways to achieve this. I have posted the answer here if you're interested in knowing :)
Nice! And the order of concatenation, not an issue?
Can always add .sort() :-)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.