1

I have 4842 documents with a sample format {"ID":"12345","NAME":"name_value","KIND":"kind_value",...,"Secondary":{...},"Tertiary":{...}} where “...” are a few more varying number of key value pairs per object

I have indexed KIND as a full text index using - db.collection.ensureFulltextIndex("KIND") before inserting data.Also, KIND is just a one word string. ie. without spaces

Via AQL following queries were executed:

  1. FOR doc IN FULLTEXT(collection, 'KIND', 'DeploymentFile') RETURN doc --> takes 3.54s (avg)

  2. FOR doc IN collection FILTER doc.KIND == 'DeploymentFile' RETURN doc --> takes 1.16s (avg)

2944 Objects returned in both queries

Q1. Assuming that we have used a fulltext index and I haven't hash indexed KIND, shouldn't the query using FULLTEXT function be faster than the normal == operation (since == doesn't utilize the full text index). If so, what am I doing wrong here?

Q2. Utilizing the fulltext index, can i perform a query which does a CONTAINS string or LIKE string?

---UPDATE Q2.The requirement is searching for a substring within a parent string (which is only one word). The substring can lie anywhere within the parent string. (SQL equivalent of LIKE '%text%')

1
  • I've tried the queries on my notebook. The ran in 40ms / 56ms using arangosh. Which driver are you using? Commented Oct 12, 2015 at 11:41

1 Answer 1

2

Q1: The fulltext index does allow for more complex query. It splits the text at word breaks and checks if a word occurs within a larger text. All of these features are not needed in your example. Therefore it generates more overhead than it is saving.

In your example it would be better to create a skip-list or hash-index and search for equality.

Q2: In the simplest form, a fulltext query contains just the sought word. If multiple search words are given in a query, they should be separated by commas. All search words will be combined with a logical AND by default, and only such documents will be returned that contain all search words. This default behavior can be changed by providing the extra control characters in the fulltext query, which are:

  • +: logical AND (intersection)
  • |: logical OR (union)
  • -: negation (exclusion)

Examples:

  • "banana": searches for documents containing "banana"
  • "banana,apple": searches for documents containing both "banana" AND "apple"
  • "banana,|orange": searches for documents containing either "banana" OR "orange" OR both
  • "banana,-apple": searches for documents that contains "banana" but NOT "apple".

Logical operators are evaluated from left to right.

Each search word can optionally be prefixed with complete: or prefix:, with complete: being the default. This allows searching for complete words or for word prefixes. Suffix searches or any other forms are partial-word matching are currently not supported.

Examples:

  • "complete:banana": searches for documents containing the exact word "banana"
  • "prefix:head": searches for documents with words that start with prefix "head"
  • "prefix:head,banana": searches for documents contain words starting with prefix - "head" and that also contain the exact word "banana".

Complete match and prefix search options can be combined with the logical operators.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the prompt reply. This means suppose i have a random string "DeploymentFileName" and I want to perform an SQL equivalent LIKE query on '%File%'. Is there any mechanism by which i can achieve this? In other words, definitely " prefix: " will help me to match a substring at the beginning of the searched string. What if the substring lies in the middle of the searched string?
@PranavAcharya: Just came across this question. The implementation of the fulltext index only supports full and prefix matches. A search like %File% is possible with the LIKE operator and LIKE() function in AQL, but can not make use of any index. In fact, it would require a completely different type of index that can accelerate this type of search (I believe suffix tries). If you can FILTER out many documents supported by an index before calling LIKE(), performance may be acceptable though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.