3

I'm new to ArangoDB, having trouble optimizing my queries and am hoping for some help.

The query I've provided below is a real example that I'm struggling with, 758.078 ms on my dev database but on staging, with a much larger dataset, it takes 531.511 s.

I'll also provide the size of each of the edge tables I'm traversing in dev and staging. Any help is really appreciated.

for doc in document
filter repo._key == "my-key"
    for v, e, p in 3 any doc edge1, edge2, edge3
    options {uniqueVertices: 'global', bfs: true}
    filter DATE_ISO8601(p.vertices[2].date) > DATE_ISO8601("2017-09-04T00:00:01Z")
        and DATE_ISO8601(p.vertices[2].date) < DATE_ISO8601("2017-09-15T23:59:59Z")
    limit 1
    return {
        commit: p.vertices[2].hash,
        date: p.vertices[2].date,
        message: p.vertices[2].message,
        author: p.vertices[1].email,
        loc: p.vertices[3].stats.additions
    }

DEV

  • edge1: 2,638
  • edge2: 2,560
  • edge3: 386

STAGING

  • edge1: 5,438,811
  • edge2: 5,544,028
  • edge3: 423,545

2 Answers 2

2

The query is potentially slow because the filter condition

filter 
  DATE_ISO8601(p.vertices[2].date) > DATE_ISO8601("2017-09-04T00:00:01Z"
and 
  DATE_ISO8601(p.vertices[2].date) < DATE_ISO8601("2017-09-15T23:59:59Z")

is not applied during the traversal but only afterwards. Potentially this is due to the function calls (to DATE_ISO8601) in the filter conditions. If your date values are stored as numbers, can you check whether the following filter condition speeds up the query:

filter 
  p.vertices[2].date > DATE_TIMESTAMP("2017-09-04T00:00:01Z"
and 
  p.vertices[2].date < DATE_TIMESTAMP("2017-09-15T23:59:59Z")

That modified filter condition should allow pulling the filter condition inside the traversal, so it is executed earlier.

You can verify the query execution plans using db._explain(<query string goes here>); in the ArangoShell or from the web interface's AQL editor.

Sign up to request clarification or add additional context in comments.

2 Comments

Hey, thanks a lot. How would I apply my filter to the traversal? I don't know that I've seen an example of that so not sure how to restructure that piece of the query.
I think you can just replace your filter condition with the one I proposed. That will however only work if your date values are stored as numbers. You can then check the execution plan of the query using the web interface. The execution plan should change in the sense that in the "Traversals" section at the bottom of it, the filter condition should show up. Whereas in the original query the filter condition should not be shown in the "Traversals" section.
0

Probably a bit late but it will help someone. Using the DATE function will make the query much slower, so if possible remove the DATE function. For example enter image description hereenter image description here

You can see that the query filter command uses a Date function that is ~ 7s. If you do not use the date function, it will run faster than ~ 0.5s. These two lines will query the data of 2018-09-29.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.