1

What is the order of operations for boolean operators? Left to right? Right to left? Specific operators have higher priority?

For example, if I search for: jakarta OR apache AND website

What do I get? Is it Anything with "jakarta" as well as anything with both "apache" and "website"? Anything with "website" that also has either "jakarta" or "apache"? Something else?

2
  • welcome to stackoverflow! isn't operator precedence depends on the query parser you use? Commented Dec 15, 2022 at 22:19
  • Hi, it may depend on the programming language but I think usually AND has priority over OR, so your sentence is equivalent to: jakarta OR (apache AND website) Commented Dec 15, 2022 at 22:24

1 Answer 1

2

Short answer:

In Lucene, the AND operator takes precedence over the OR operator. So, you are effectively doing this:

jakarta OR (apache AND website)

You can verify this for yourself by parsing your query string and seeing how it converts AND and OR to the "required" and "optional" operators.

And the NOT operator takes precendence over the AND operator, since we are discussing precedence.

But you need to be very careful when dealing with Lucene's so-called "boolean" operators, as they do not behave the way you may expect based on their collective name ("boolean").

(Unfortunately I have never seen any official documentation which provides a citation for these precedence rules - but instead I am relying on empirical observations. See below for more about that. If the documentation for this does exist, that would be great to see.)


Longer Answer

One key thing to understand is that Lucene boolean operators are not really "boolean" in the sense that you may think, based on Boolean algebra, where you use parentheses to help avoid ambiguity (or where you need to know what rules a programming language may be applying) - and where everything evaluates to TRUE or FALSE.

Lucene boolean operators serve a subtly different purpose.

They are not purely concerned with TRUE/FALSE inclusion/exclusion, but also concerned with how to score results so that the more relevant results have higher scores than less relevant results.

The Lucene query jakarta OR apache AND website is equivalent to the following:

jakarta +apache +website

This means the document's field must contain apache and website, but may also include jakarta (for a higher relevance score).

You can see this for yourself by taking your original query string and parsing it:

Query query = parser.parse(queryString);

...and then printing the resulting string representation of the query. The + operator is the "required" operator. It:

requires that the term after the "+" symbol exist somewhere in the field

And the lack of a + operator means the default of "may" as in "may contain" - meaning the term is optional: it does not need to be present, if there is some other clause in the query which does match a document.

The use of AND forces the terms on either side of the AND to be required.


You can encounter some potentially surprising situations.

Consider this:

foo AND bar OR baz AND bat

This parses to the following:

+foo +bar +baz +bat

This is because the AND operators are transformed to + operators for every term, rendering the OR redundant.

It's the same result as if you had written this:

foo AND bar AND baz AND bat

But not the same as this:

(foo AND bar) OR (baz AND bat)

which is parsed to this, where the parentheses are retained:

(+foo +bar) (+baz +bat)

Bottom Line:

Use parentheses to explicitly make your intentions clear, when using AND and OR and also NOT.


Regarding NOT, since we mentioned it - that takes prescendence over AND.

The query:

foo AND bar NOT baz AND bat

Is parsed as:

+foo +bar -baz +bat

So, a document field must contain foo, bar and bat - and must not contain baz.


Why does this situation exist?

I don't know, but I think Lucene originally did not include AND, OR and NOT - but instead used + (must include), - (must not include) and "nothing" (may include). The so-called boolean operators AND, OR, NOT were added later on, as a kind of "syntactic sugar" for these original operators - introduced for people who were more familiar with AND, OR and NOT from other contexts. I'm basing this on the following thread:

Getting a Better Understanding of Lucene's Search Operators

A summary of that thread is included in this answer about the NOT operator.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.