1

At first let me introduce my use case - I have a collection with documents where I store my XML requests and corresponding responses. Also each document has plenty of accompanying properties and some of them are indexed, but request and response aren't.

Whenever I search using indexed field the performance is sufficient. But there are situations where I have to prepare search using a regular expression basing on the request or response value.

For now I do something like this:

db.traffic.find(
    { $or: 
        [ { request: { $regex: "some.* code=\"123\"} },
          { response: { $regex: "some.* code=\"123\"} }] })

and I translated this into Java code. But querying is to slow and it takes significant amount of time comparing to other queries.

I can see two solutions:

  1. indexing requests and responses - but I suppose that this is not a good idea as they are really long and most probably index will be huge.
  2. querying using some indexed field first and then applying already mentioned query but in the descending order of found records and picking the very first found so I would like to do something like

    db.traffic.find({"conversationID": { $regex: "vendorName" }}).sort({"counter": -1}) 
        .findOne(
            { $or: 
                [ { request: { $regex: "some.* code=\"123\"} },
                  { response: { $regex: "some.* code=\"123\"} }] })
    

So wrapping up - my question is: should I choose the simpler solution which is indexing requests and responses? And what impact will it have on the size of my index?

Or should I choose the second way? But is my code correct and does it what I want?

1
  • 1
    Whatever anyone says, the best answer will always be to try both and benchmark Commented Nov 26, 2015 at 23:25

2 Answers 2

1

Have you tried an explain on both methods?

Use the mongo shell to test queries, and add explain before the query, so:

db.traffic.explain()...

try both and you should get some information that indicates the direction.

Sign up to request clarification or add additional context in comments.

Comments

0

In the end I tried the second solution but I had slightly change it because of the fact that I can't run findOne on the query result. But I found the equivalent syntax.

So it looks now sth like this:

db.traffic.findOne($query: { $or: 
                            [ { request: { $regex: "some.* code=\"123\"} },
                              { response: { $regex: "some.* code=\"123\"} }] },
                   $orderby: { "counter": -1})

and the performance is much better now.

Also I used explain to check the real "speed".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.