1

My database currently consists of 3 document collections with between 250k to 1.5M documents. I set my own document _keys and have added Hash indexes on a few toplevel fields and lists (the lists containing references to other keys or (indexed) fields).

The collections A and C have an n:m relationship via B. The query I first came up with looks like this:

for a in collection_a
    filter a.name != null
    filter length(a.bs) > 0
    limit 1
    return {
          'akey': a._key
        , 'name': a.name
        , 'cs': (
            for b in collection_b
                filter b.a == a._key
                for c in collection_c
                    filter b.c == c._key
                    return c.name
        )
    }

This is excruciatingly slow. I also tried other approaches such as making the middle for a for b in a.bs (bs being a list of keys of collection_b documents).

Printing out explain() of the above query returns an immense cost and getExtra() indicates no indexes were used:

{ 
  "stats" : { 
    "writesExecuted" : 0, 
    "writesIgnored" : 0, 
    "scannedFull" : 6009930, 
    "scannedIndex" : 0 
  }, 
  "warnings" : [ ] 
}

An alternate approach works as fast as I'd expected it to be in the first place:

for a in collection_a
    filter a.name != null
    filter length(a.bs) > 0
    limit 1
    return {
          'akey': a._key
        , 'name': a.name
        , 'cs': (
            for b in a.bs
                return DOCUMENT(collection_c , DOCUMENT(collection_b, b).c ).name
        )
    }

But even here, no indexes appear to be used:

{ 
  "stats" : { 
    "writesExecuted" : 0, 
    "writesIgnored" : 0, 
    "scannedFull" : 3000, 
    "scannedIndex" : 0 
  }, 
  "warnings" : [ ] 
}

One thing that may already explain this is, that hash indexes don't work for elements of a list (or I made a mistake when creating them)? The getExtras() of the second example would hint at this.

My expectation, however, would be that arangodb indexes all elements of the lists (such as a.bs) and the query optimizer should realize that indexed attributes are used in the query.

If I run for b in collection_b filter b.a == 'somekey', I get an instantaneous result as expected. And that's just running the middle for in isolation. Same behaviour when I run the innermost for in isolation.

Is this a bug? Is there an explanation for this behaviour? Am I doing something wrong in the first query? The AQL Examples themself use nested fors so that's what I naturally ended up trying first.

11
  • We change the query optimizier from 2.2 to 2.3. Which version of ArangoDB do you use? Commented Nov 26, 2014 at 12:43
  • I'm running arangodb 2.3.0 (amd64) on debian testing Commented Nov 26, 2014 at 12:52
  • 1
    Thanks for that. A partial fix for the subquery not using the index should be included in this commit: github.com/triAGENS/ArangoDB/commit/… Commented Nov 26, 2014 at 14:55
  • Thank you! Just verified this reduces the query time down to about 6sec from almost 30sec (in a totally non-scientific test)! Commented Nov 26, 2014 at 16:57
  • 1
    I also made another modification that pipes optimized plans through the same optimizer stage again, enabling potentially even more index usage: github.com/triAGENS/ArangoDB/commit/… Commented Nov 26, 2014 at 19:38

1 Answer 1

2

This has been fixed in release 2.3.2.

clarification: the query you posted is correct. There was an issue in release 2.3.0 that prevented indexes in subqueries being used. This issue has been fixed in release 2.3.2. The initial query you posted should properly use indexes in 2.3.2. If there is a hash index available on the join attributes, it should be used because the query only contains equality lookups.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.