2

I am new to elasticsearch. And I have two documents which are JarFileData and ClassData. I have linked those two documents with jarFileId field.

This is the ClassData

{
"_id" : ObjectId("59881021e950041f0c6fa1fa"),
"ClassName" : "Exception",
"jarFileId" : "JAR-0001",
"dependencies" : [ 
    {
        "dependedntClass" : "java/lang/RuntimeException",
        "methodSignature" : "<init>"
    }, 
    {
    {
        "dependedntClass" : "java/awt/EventQueue",
        "methodSignature" : "isDispatchThread"
    }, 
    {
        "dependedntClass" : "Exception",
        "methodSignature" : "setStackTrace"
    }
]
}

This is JarFileData

{
"_id" : ObjectId("59881021e950041f0c6fa1f7"),
"jarFileName" : "Client.jar",
"jarFileId" : "JAR-0001",
"directory" : "C:\\Projects\\Test\\Application",
"version" : null,
"artifactID" : null,
"groupID" : null
}

I want to give a directory and get all jarFiles in that directory and use it to find the dependent classes in ClassData type for those jarFiles.

This is the function I used in node.js for retrieving jarFileData type for a given directory.

const test = function test() {
let body = {
  size: 20,
  from: 0,
  {
   query: {
     match: {
       directory: 'C:\\Projects\\Test\\Application'
        }
      }
    }
  };
}

I am trying to use the resultset from the above query to query classData type. I am stuck in this part for a long time and don't know how to do it in elastic-search. Any help would be much appreciated.

3
  • What have you tried? It might also be noted that the mongodb query attempts here are actually run from the "shell". Your elasticsearch queries would need to be run from the actual language environment you want to implement in. So it would help you at least show what you are trying so your question can be answered in the context of the actual environment you are using. Which will not be the mongodb shell. Commented Aug 9, 2017 at 5:37
  • I am trying to query elastic-search with node.js. I have tried to and succeeded with simple queries but I couldn't do complex queries which I need to do. Commented Aug 9, 2017 at 5:49
  • Yes we can read. What you are being asked to provide is "some code showing your attempt at doing so". Without which this becomes a "Write my code for me" question of which many people will choose to ignore. Show an "attempt" and people are generally happy to help with "corrections/usage" etc. Commented Aug 9, 2017 at 5:53

1 Answer 1

1

Before you can go further, there are two steps that needs to be done:

  • jarFileId and dependedntClass fields should be mapped as a keyword type (if this is a problem you can use multi-field field of keyword type, and use them in query)
  • dependencies should be nested object

Looking at your data, the joining element between these two types of documents is jarFileId field. If your existing query gave you in result e.g. this list of jars:

{[{"jarFileId": "JAR-0001"},{"jarFileId": "JAR-0002"}]}

having this information, you can use this query:

{
   "size":0,
   "query":{
      "constant_score":{
         "filter":{
            "terms":{ "jarFileId":["JAR-0001","JAR-0002"] }
         }
      }
   },
   "aggs":{
      "filtered":{
         "filter":{
            "constant_score":{
               "filter":{
                   "terms":{ "jarFileId":["JAR-0001","JAR-0002"] }
               }
            }
         },
         "aggs":{
            "dependent":{
               "nested":{
                  "path":"dependencies"
               },
               "aggs":{
                  "classes":{
                     "terms":{
                        "field":"dependencies.dependedntClass"
                     }
                  }
               }
            }
         }
      }
   }
}

And as a result you'll get:

{
    ...,
    "aggregations": {
        "filtered": {
            "doc_count": 1,
            "dependent": {
                "doc_count": 3,
                "classes": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "core/internal/TrackingEventQueue$TrackingException",
                            "doc_count": 1
                        },
                        {
                            "key": "java/awt/EventQueue",
                            "doc_count": 1
                        },
                        {
                            "key": "java/lang/RuntimeException",
                            "doc_count": 1
                        }
                    ]
                }
            }
        }
    }
}

With your current model, it is not possible to do it with one query - elsticsearch does not have a join mechanism. A single document should have all the necessary information so that elasticsearch is able to decide if it matches the query or not. This is nicely described here. So either you go with application-side joins (similar example to yours under the link) or denormalize your data if the performance of search is the core issue here. The only built-in "join mechanism" that I'm aware of is Term Filter Lookup but it allows to operate only on id field.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much, The thing is I want a way to use the result set of the first query in the second query. I have managed to do it separately as you have done it above. But I couldn't use the resultset of first query in the second query. Thank you once again.
This is what I have looked for, I will try to update my model and do it. Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.