3

What is the best way to create index when I have a query like this?

... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...

I know that only one index can be used in a query so I can't create two indexes, one for user_1 and one for user_2.

Also could solution for this type of query be used for this query?

WHERE ((user_1 = '$user_id' AND user_2 = '$friend_id') OR (user_1 = '$friend_id' AND user_2 = '$user_id'))

3 Answers 3

4

MySQL has a hard time with OR conditions. In theory, there's an index merge optimization that @duskwuff mentions, but in practice, it doesn't kick in when you think it should. Besides, it doesn't give as performance as a single index when it does.

The solution most people use to work around this is to split up the query:

SELECT ... WHERE user_1 = ?
UNION
SELECT ... WHERE user_2 = ?

That way each query will be able to use its own choice for index, without relying on the unreliable index merge feature.

Your second query is optimizable more simply. It's just a tuple comparison. It can be written this way:

WHERE (user_1, user_2) IN (('$user_id', '$friend_id'), ('$friend_id', '$user_id'))

In old versions of MySQL, tuple comparisons would not use an index, but since 5.7.3, it will (see https://dev.mysql.com/doc/refman/5.7/en/row-constructor-optimization.html).

P.S.: Don't interpolate application code variables directly into your SQL expressions. Use query parameters instead.

Sign up to request clarification or add additional context in comments.

Comments

1

I know that only one index can be used in a query…

This is incorrect. Under the right circumstances, MySQL will routinely use multiple indexes in a query. (For example, a query JOINing multiple tables will almost always use at least one index on each table involved.)

In the case of your first query, MySQL will use an index merge union optimization. If both columns are indexed, the EXPLAIN output will give an explanation along the lines of:

Using union(index_on_user_1,index_on_user_2); Using where

The query shown in your second example is covered by an index on (user_1, user_2). Create that index if you plan on running those queries routinely.

2 Comments

The condition on user_2 will not use an index on (user_1, user_2), because user_2 is not the left-most column of the index. Just like you can't look up a person by first name only in the telephone book.
@BillKarwin Read the second query closely. It's a union of two equality conditions on (user_1, user_2).
0

The two cases are different.

At the first case both columns needs to be searched for the same value. If you have a two column index (u1,u2) then it may be used at the column u1 as it cannot be used at column u2. If you have two indexes separate for u1 and u2 probably both of them will be used. The choice comes from statistics based on how many rows are expected to be returned. If returned rows expected few an index seek will be selected, if the appropriate index is available. If the number is high a scan is preferable, either table or index.

At the second case again both columns need to be checked again, but within each search there are two sub-searches where the second sub-search will be upon the results of the first one, due to the AND condition. Here it matters more and two indexes u1 and u2 will help as any field chosen to be searched first will have an index. The choice to use an index is like i describe above.

In either case however every OR will force 1 more search or set of searches. So the proposed solution of breaking using union does not hinder more as the table will be searched x times no matter 1 select with OR(s) or x selects with union and no matter index selection and type of search (seek or scan). As a result, since each select at the union get its own execution plan part, it is more likely that (single column) indexes will be used and finally get all row result sets from all parts around the OR(s). If you do not want to copy a large select statement to many unions you may get the primary key values and then select those or use a view to be sure the majority of the statement is in one place.

Finally, if you exclude the union option, there is a way to trick the optimizer to use a single index. Create a double index u1,u2 (or u2,u1 - whatever column has higher cardinality goes first) and modify your statement so all OR parts use all columns:

... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...

will be converted to:

... WHERE ((user_1 = '$user_id' and user_2=user_2) OR (user_1=user_1 and user_2 = '$user_id')) ...

This way a double index (u1,u2) will be used at all times. Please not that this will work if columns are nullable and bypassing this with isnull or coalesce may cause index not to be selected. It will work with ansi nulls off however.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.