1

(If you can think of a better title please say)

So I have a database with tables which I want to be able to search. Search as in I have the input intel core i5 and I want to get any row which column Name has intel, core and i5 in it in any arrangement, characters surrounding these words or any other words anywhere.

So far I am doing:

search = "intel core i5" # This is taken from an entry field.
words = []
for word in search.split()
    words.append("%" + word.strip() + "%")
results = db_handler.query("""SELECT Table1.ID 
                              FROM Table1 
                                  JOIN Table2 
                                  ON Table1.ID2 = Table2.ID 
                              WHERE Table2.Name LIKE({0}) AND 
                                  Active=1""".format(", ".join("?" for _ in self.words)), self.words)
# db_handler.query is a method which queries and returns results. Plus some other stuff.
# In Table1 there is some columns like ID, ID2, Active, eta
# ID2 matches ID in Table2 which also contains Name
# So I want ID from Table1 searched by Name from Table2  

But that does not work as LIKE does not take more than one arg. (There is probably a better way of doing it than splitting the input up but I did this as this was what made sense, is there?) I have seen some people with a bit different questions be suggested REGEXP but I have look at it but did not really get it for my use. If this is the best, can you explain how thanks. How should I do this?

Thanks

2 Answers 2

3

LIKE takes one pattern, but you can include multiple keywords in it, for example:

... LIKE '%intel%core%i5%' ...

This will match values containing intel followed by core followed by i5, with arbitrary strings before, after, and in between.

To find records containing any arrangements, you need to use multiple LIKE clauses of all permutations (there are 6 of them in this example), and OR them together, for example:

... (Table2.Name LIKE '%intel%core%i5%' OR Table2.Name LIKE '%intel%i5%core%' OR ...) ...

In context:

from itertools import permutations

search = "intel core i5"
words = [word.strip() for word in search.split()]
combinations = ["%" + "%".join(order) + "%" for order in list(permutations(words)]
sql = """SELECT <Columns> FROM <Table> 
         WHERE [Other ...] 
             (<Col> LIKE {0})""".format(" OR <Col> LIKE ".join("?" for _ in combinations))
values = combinations # sql and values/combinations to be sent to a query function.
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks. As the input is not limited I would have to dynamically create the query length, which i am at the moment but I would also have to generate all the combinations. This can be done but is there a more elegant/simple solution?
No, SQL doesn't have such operator that would help you match any arrangement of multiple keywords. Brute-forcing all permutations as I suggested is the only way.
However, you can generate the permutations easily in Python using itertools.permutations
I know a bit about itertools but not permtations. I'll look at the docs but does it take args and return all possible combinations?
For example list(itertools.permutations(('intel', 'i5', 'core'), 3)) will give all 6 combinations of length 3.
|
2

Let's suppose (to consider the general problem) that you want to be able to find rows with any or all of the words in list or set words in column col. Let's also further suppose that you are fully aware of the risks of SQL injection attacks, and so you are fully sanitising any user input before including it in words.

The query condition you want for a single word w can be expressed in Python as

"{0} LIKE '%{}%'".format(col, w)

The individual field queries need to be joined with "OR" to find any term or "AND" to find all terms. So you can set joiner to be either ' AND ' or ' OR ' and then the whole query search condition will be

joiner.join(["{0} LIKE '%{1}%'".format(col, w) for w in words])

If you set

words = "intel core i5".split()
joiner = " AND "
col = "Table2.name"

this evaluates to

Table2.name LIKE '%intel%' AND Table2.name LIKE '%core%' AND Table2.name LIKE '%i5%'

You clearly already know enough Python to be able to do what you want with that.

2 Comments

Nice general overview. How much protection does python's sqlite3 module query offer ("?" (value,)) against sql injections as that what I have read to do? For my product it does not matter much as the db is user stored, desktop access only by the user. The only way could be though the web-scraper which only get data from pcpp website and it also get filtered though beautifulsoup4 if that changes anything. Another question: if you were going to include results of less than all you would want to list them by relevance so results would need a relevance value too.
If you use parameter substitutions in the driver you're supposed to be immune from SQL injection. Unfortunately, parameterisation isn't possible for table and column names, just values. I didn't even think about relevance scores - that would need a much more structured approach to storae and search.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.