0

I'm saving TBs worth of data in ES in the following manner:

"class" : 
{
 "type": "nested",
 "properties": {
   "name": {"type": "string"},
   "methods": [ {
     "name": {"type": "string"}
   } ]
}        

Simply put, I'm saving data as (class1, [method1, method2,...]), (class2, [method3, method4,...]) ...

I saw in the ES docs, that all data is reduced in lucene key-value pair, not sure if that is relevant here.

Would it decrease the search latency, if I arrange the data as follows: {class1,method1}, {class1,method2},.... {class2, method3}....

Sample query: Search for given class name and method name pair, and show all docs having that pair in the index.

Appreciate any help. Please suggest, if there is a better way to handle it.

5
  • 1
    It might help if you provided some sample documents and some sample queries you hope to run against your data. Designing your data layout largely depends on how you plan to query your data. As a general rule it helps to flatten and denormalize your data. Avoiding nested docs if at all possible will also give you more flexibility and better performance. Commented Feb 21, 2016 at 4:12
  • Keep in mind, you can store arrays of values without using nested docs... Commented Feb 21, 2016 at 4:17
  • I dont wanna lose mapping of class1-> [method1, method2, ...], thats why I'm using nested type, not sure how can I achieve that otherwise. Commented Feb 21, 2016 at 4:23
  • Are there other fields besides class and method that you are storing? It would still be helpful to see a couple complete examples of the data you are trying to store. I'm assuming, then, that 'class' is one of many other properties of some other "parent" document that you haven't included in your question. Is that correct? Commented Feb 21, 2016 at 14:29
  • I have many types in an index, one of the types is class type which will store class names and all methods in that class. There are other types which are stored in that index but they are separately stored in other type. An example of Data stored in class type can be - {"class":"Java.lang.class1", "method":"dosomething"} Commented Feb 21, 2016 at 20:33

1 Answer 1

1

Between your two options (i.e. one nested doc per class vs. one nested doc per class and method pair), there should not be a noticeable difference in search times. Personally, I would prefer the first option, since it seems a better model of your data. Plus, it means fewer documents in total. (Keeping in mind, that a "nested" doc in ES is really just another true document in Lucene, under the hood. ES simply manages keeping the nested docs located directly next to your parent doc for efficient relationship management)

Internally, ES treats every value as an array, so it is certainly suited to handle the first option. Assuming an example mapping like this:

PUT /my_index/
{
  "mappings": {
    "my_type": {
      "properties": {
        "someField": { "type": "string" },
        "classes": {
          "type": "nested", 
          "properties": {
            "class": { "type":"string", "index":"not_analyzed" },
            "method": { "type": "string", "index":"not_analyzed" }
          }
        }
      }
    }
  }
}

You can then input your documents, such as:

POST test_index/my_type
{
  "someField":"A",
  "classes": {
    "class":"Java.lang.class1",
    "method":["myMethod1","myMethod2"]
  }
}

POST test_index/my_type
{
  "someField":"B",
  "classes": {
    "class":"Java.lang.class2",
    "method":["myMethod3","myMethod4"]
  }
}

In order to satisfy your sample query, you can simply use a bool filter inside a nested query. For example:

GET test_index/my_type/_search
{
  "query": {
    "nested": {
      "path": "classes",
      "query": {
        "bool": {
          "filter": [
            { "term": {"classes.class":"Java.lang.class2"} },
            { "term": {"classes.method":"myMethod3"} }
          ]
        }
      }
    }
  }
}

This would return the second document from my example.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! Just to make it more clear, I have to save many classes - methods pair in a type of index, which means the mapping of a class with all it's methods has to be retained. It's not one class per doc, I have to store many classes in a doc and all classes mapping with their respective array methods set.
I see, that does help. I was confused by your example class doc which only had the two fields. In that case, it seems nested is necessary. My preference would stay the same then with the nested doc, and preserve the array of methods. No need to take the product of classes and methods and generate separate pairs, in my opinion.
@KumarVikramjeet, I edited my answer to take your nested docs into consideration. Hope it helps!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.