0

On my html page, the user has the option to either enter a text string, check mark options, or do both. This data is then placed inside a mysql query which displays the data.

The fact that the user is allowed to enter a string means that I am using the LIKE function in the mysql query.

Correct me if I am wrong, but I believe the LIKE function can slow the query down a lot. In relation to the above statement, I would like to know whether an empty string in the LIKE function would make a difference, so for example:

select * from hello;
select * from hello where name like "%%";

If it does make a significant difference (I believe this database will be growing larger) what are your ideas on how to deal with this.

My first idea was that I will have 2 queries: One with the like functionality and one without the like functionality. Depending on what the user enters, the correct query will be called.

So for example if the user leaves the search box empty, the like function will not be needed, there fore it will send a null character, and an if statement will select the other option (without the like functionality) when it sees there is a null character.

Is there a better way of doing this?

9
  • yep, not checking the LIKE '%%' if there is no string to check it against definitely seems like the good way to do this :) Commented Jul 11, 2014 at 20:38
  • 3
    depends on how smart the query compiler is. if it recognizes that like '%%' will match everything and eliminates it from the statement, then your two queries would perform identically. Commented Jul 11, 2014 at 20:38
  • Have you tried testing this and putting it into practice? I am assuming it is sanitized. I would use a statement to evaluate if there is anything the user is submitting and do logic based off of that. You're asking a broad question: "Can I do this better?". The answer is yes. You always can. Commented Jul 11, 2014 at 20:40
  • I cant really test it as at the moment as I do not have that amount of data. The difference in query time will be negligible for both of them. Commented Jul 11, 2014 at 20:43
  • It looks like it's not smart enough to elide LIKE '%%'. I just tried select * from users where username like '%%' in a table with 3.3 million rows. It took 1.59 sec. EXPLAIN says it used the index on the username column. Commented Jul 11, 2014 at 21:06

3 Answers 3

2

In general, the LIKE function will be slow unless it begins with a fixed string and the column has an index. If you do LIKE 'foo%', it can use the index to find all rows that begin with foo, because MySQL indexes use B-trees. But LIKE '%foo' cannot make use of an index, because B-trees only optimize looking for prefixes; this has to do a sequential scan of the entire table.

And even when you use the version with a prefix, the performance improvement depends on how much that prefix reduces the number of rows that have to be searched. If you do LIKE 'foo%bar', and 90% of your rows begin with foo, this will still have to scan 90% of the table to test whether they end with bar.

Since LIKE '%%' doesn't have a fixed prefix, it will perform a full scan of the table, even though there isn't actually anything to search for. It would be best if your PHP script tested whether the user provided a search string, and omit the LIKE test if there's nothing to search for.

Sign up to request clarification or add additional context in comments.

Comments

0

I believe the LIKE function can slow the query down a lot

I would expect that not to be the case. How hard would it be to test it?

Regardless which version of the query you run, the DBMS still has to examine every row in the table. That will require some extra work by the CPU, but for large tables, disk I/O will be the limiting factor. LIKE '%%' will discard rows with null values - hence potentially reducing the amount of data the DBMS needs to retain in the result set / transfer to the client which may be significant saving.

As Barbar says, providing an expression without a leading wildcard will allow the DBMS to use an index (if one is available) which will have a big impact on performance.

Its hard to tell from your question (you didn't provide much in the way of example queries/data nor any detail of what the application does) but the solution to your problem might be full text indexing

1 Comment

I cannot test this out as I do not have that much data yet. The data base currently does not have many entries and testing both the methods yields a negligible difference in performance.
0

Using the World database sample from the mysql software distribution, I first did a simple explain on queries with and without where clauses without filtering effects:

mysql> explain select * from City;

mysql> explain select * from City where true;

mysql> explain select * from City where Name = Name;

In these first three cases, the result is as follow:

+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
|  1 | SIMPLE      | City  | ALL   | NULL          | NULL | NULL    | NULL | 4080 |       |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+

While for the last query, I got the following:

mysql> explain select * from City where Name like "%%";

+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra       |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
|  1 | SIMPLE      | City  | ALL   | NULL          | NULL | NULL    | NULL | 4080 | Using where |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+

You can see that for this particular query, the where condition was not optimized away.

I also performed a couple of measurements, to check if indeed there would be a sensible difference, but:

  • the table having only 4080 rows, I used a self cross join to render longer computation times

  • I used having clauses to cut down on display overhead (1).

Measurement results:

mysql> select c1.Name, c2.Name from City c1, City c2 where concat(c1.Name,c2.Name) = concat(c1.Name,c2.Name) having c1.Name = ""; 
Empty set (5.22 sec)

The above query, as well as one with true or c1.Name = c1.Name performed sensibly the same, within less than a 0.1 sec margin.

mysql> reset query cache;

mysql> select c1.Name, c2.Name from City c1, City c2 where concat(c1.Name,c2.Name) like "%%" having c1.Name = "";
Empty set (13.80 sec)

This one also took around the same amount of time when run several times (in between query cache resets) (2).

Clearly the query optimizer doesn't see an opportunity for the later case. The conclusion is that you should try to avoid as much as possible the use of that clause, even if it doesn't change the result set.


(1): having clause filtering happening after data consolidation from the query, I assumed it shouldn't change the actual query computation load ratio.

(2): interestingly, I initially tried a simple where c1.Name like ”%%", and got around 5.0 sec. timing results, which led me to try out with a more elaborate clause. I don't think that result changes the overall conclusion; it could be that in that very specific case, the filtering actually has a beneficial effect. Hopefully a mysql guru will explain that result.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.