Handling string and string array pair effectively

Question

I'm saving TBs worth of data in ES in the following manner:

"class" : 
{
 "type": "nested",
 "properties": {
   "name": {"type": "string"},
   "methods": [ {
     "name": {"type": "string"}
   } ]
}

Simply put, I'm saving data as (class1, [method1, method2,...]), (class2, [method3, method4,...]) ...

I saw in the ES docs, that all data is reduced in lucene key-value pair, not sure if that is relevant here.

Would it decrease the search latency, if I arrange the data as follows: {class1,method1}, {class1,method2},.... {class2, method3}....

Sample query: Search for given class name and method name pair, and show all docs having that pair in the index.

Appreciate any help. Please suggest, if there is a better way to handle it.

It might help if you provided some sample documents and some sample queries you hope to run against your data. Designing your data layout largely depends on how you plan to query your data. As a general rule it helps to flatten and denormalize your data. Avoiding nested docs if at all possible will also give you more flexibility and better performance. — BrookeB
– BrookeB, Commented Feb 21, 2016 at 4:12
Keep in mind, you can store arrays of values without using nested docs... — BrookeB
– BrookeB, Commented Feb 21, 2016 at 4:17
I dont wanna lose mapping of class1-> [method1, method2, ...], thats why I'm using nested type, not sure how can I achieve that otherwise. — Kumar Vikramjeet
– Kumar Vikramjeet, Commented Feb 21, 2016 at 4:23
Are there other fields besides class and method that you are storing? It would still be helpful to see a couple complete examples of the data you are trying to store. I'm assuming, then, that 'class' is one of many other properties of some other "parent" document that you haven't included in your question. Is that correct? — BrookeB
– BrookeB, Commented Feb 21, 2016 at 14:29
I have many types in an index, one of the types is class type which will store class names and all methods in that class. There are other types which are stored in that index but they are separately stored in other type. An example of Data stored in class type can be - {"class":"Java.lang.class1", "method":"dosomething"} — Kumar Vikramjeet
– Kumar Vikramjeet, Commented Feb 21, 2016 at 20:33

BrookeB · Accepted Answer · 2016-02-22 17:14:57Z

1

Between your two options (i.e. one nested doc per class vs. one nested doc per class and method pair), there should not be a noticeable difference in search times. Personally, I would prefer the first option, since it seems a better model of your data. Plus, it means fewer documents in total. (Keeping in mind, that a "nested" doc in ES is really just another true document in Lucene, under the hood. ES simply manages keeping the nested docs located directly next to your parent doc for efficient relationship management)

Internally, ES treats every value as an array, so it is certainly suited to handle the first option. Assuming an example mapping like this:

PUT /my_index/
{
  "mappings": {
    "my_type": {
      "properties": {
        "someField": { "type": "string" },
        "classes": {
          "type": "nested", 
          "properties": {
            "class": { "type":"string", "index":"not_analyzed" },
            "method": { "type": "string", "index":"not_analyzed" }
          }
        }
      }
    }
  }
}

You can then input your documents, such as:

POST test_index/my_type
{
  "someField":"A",
  "classes": {
    "class":"Java.lang.class1",
    "method":["myMethod1","myMethod2"]
  }
}

POST test_index/my_type
{
  "someField":"B",
  "classes": {
    "class":"Java.lang.class2",
    "method":["myMethod3","myMethod4"]
  }
}

In order to satisfy your sample query, you can simply use a bool filter inside a nested query. For example:

GET test_index/my_type/_search
{
  "query": {
    "nested": {
      "path": "classes",
      "query": {
        "bool": {
          "filter": [
            { "term": {"classes.class":"Java.lang.class2"} },
            { "term": {"classes.method":"myMethod3"} }
          ]
        }
      }
    }
  }
}

This would return the second document from my example.

edited Feb 22, 2016 at 17:14

answered Feb 22, 2016 at 4:14

BrookeB

1,76914 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kumar Vikramjeet Over a year ago

Thanks! Just to make it more clear, I have to save many classes - methods pair in a type of index, which means the mapping of a class with all it's methods has to be retained. It's not one class per doc, I have to store many classes in a doc and all classes mapping with their respective array methods set.

BrookeB Over a year ago

I see, that does help. I was confused by your example class doc which only had the two fields. In that case, it seems nested is necessary. My preference would stay the same then with the nested doc, and preserve the array of methods. No need to take the product of classes and methods and generate separate pairs, in my opinion.

BrookeB Over a year ago

@KumarVikramjeet, I edited my answer to take your nested docs into consideration. Hope it helps!

Collectives™ on Stack Overflow

Handling string and string array pair effectively

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related