My database currently consists of 3 document collections with between 250k to 1.5M documents. I set my own document _keys and have added Hash indexes on a few toplevel fields and lists (the lists containing references to other keys or (indexed) fields).
The collections A and C have an n:m relationship via B. The query I first came up with looks like this:
for a in collection_a
filter a.name != null
filter length(a.bs) > 0
limit 1
return {
'akey': a._key
, 'name': a.name
, 'cs': (
for b in collection_b
filter b.a == a._key
for c in collection_c
filter b.c == c._key
return c.name
)
}
This is excruciatingly slow. I also tried other approaches such as making the middle for a for b in a.bs (bs being a list of keys of collection_b documents).
Printing out explain() of the above query returns an immense cost and getExtra() indicates no indexes were used:
{
"stats" : {
"writesExecuted" : 0,
"writesIgnored" : 0,
"scannedFull" : 6009930,
"scannedIndex" : 0
},
"warnings" : [ ]
}
An alternate approach works as fast as I'd expected it to be in the first place:
for a in collection_a
filter a.name != null
filter length(a.bs) > 0
limit 1
return {
'akey': a._key
, 'name': a.name
, 'cs': (
for b in a.bs
return DOCUMENT(collection_c , DOCUMENT(collection_b, b).c ).name
)
}
But even here, no indexes appear to be used:
{
"stats" : {
"writesExecuted" : 0,
"writesIgnored" : 0,
"scannedFull" : 3000,
"scannedIndex" : 0
},
"warnings" : [ ]
}
One thing that may already explain this is, that hash indexes don't work for elements of a list (or I made a mistake when creating them)? The getExtras() of the second example would hint at this.
My expectation, however, would be that arangodb indexes all elements of the lists (such as a.bs) and the query optimizer should realize that indexed attributes are used in the query.
If I run for b in collection_b filter b.a == 'somekey', I get an instantaneous result as expected. And that's just running the middle for in isolation. Same behaviour when I run the innermost for in isolation.
Is this a bug? Is there an explanation for this behaviour? Am I doing something wrong in the first query? The AQL Examples themself use nested fors so that's what I naturally ended up trying first.