2

I need to store content keyed by strings, so a database table of key/value pairs, essentially. The keys, however, will be of a hierarchical format, like this:

foo.bar.baz

They'll have multiple categories, delimited by dots. The above value is in a category called "baz" which is in a parent category called "bar" which is in a parent category called "foo."

How can I index this in such a way that it's rapidly searchable for different permutations of the key/dot combo? For example, I want to be able to very quick find everything that starts

foo

Or

foo.bar

Yes, I could do a LIKE query, but I never need find anything like:

fo

So that seems like a waste to me.

Is there any way that SQL would index all permutation of a string delimited by the dots? So, in the above case we have:

foo
foo.bar
foo.bar.baz

Is there any type of index that would facilitate searching like that?

Edit

I will never need to search backwards or from the middle. My searches will always begin from the front of the string:

foo.bar

Never:

bar.baz

2 Answers 2

4

SQL Server can't really index substrings, no. If you only ever want to search on the first string, this will work fine, and will perform an index seek (depending on other query semantics of course):

WHERE col LIKE 'foo.%';
-- or
WHERE col LIKE 'foo.bar.%';

However when you start needing to search for bar or baz following any leading string, you will need to search on the substring:

WHERE col LIKE '%.bar.%';
-- or
WHERE PATINDEX('%.bar.%', col) > 0;

This won't work well with regular B-tree indexes, and I don't think Full-Text Search will be much help either, because of the special characters (periods) - but you should try it out if this is a requirement.

In general, storing data this way smells wrong to me. Seems to me that you should either have separate columns instead of jamming all the data into one column, or using a more relational EAV design.

Sign up to request clarification or add additional context in comments.

3 Comments

I edited and clarified -- I will never search from anywhere but the front of the string. As for this being wrong -- I agree. There's a level of denormalization here that's not ideal, but it's not in my control. I'm working with what I was given.
I loaded up a table with 1,000,000 rows in the format I explained. Query times for a LIKE from the start ('something%') were 450ms. If I added a wildcard to the start ('%something%'), query times doubled to about 850ms. Clearly, some index is at work when searching from the start. Given that I will only ever search from the start, this isn't bad performance.
Yes, like I said, if you search from the start, you will likely get an index seek.
0

Its appears to be a work for CTE!

create TableA(
id int identity,
parentid int null,
name varchar(50)
)

for a (fixed) two level its easy

select t2.name, t1.name
from tableA t1
join tableA t2 on t2.id = t1.parentid
where t2.name = 'father'

To find that kind of hierarchical values for a most general case you ill need some kind of recursion in self-join table by using a CTE.

http://msdn.microsoft.com/pt-br/library/ms175972.aspx

2 Comments

I think you missed that the three pieces of data are stored as a single string in a single column, e.g. column = 'foo.bar.baz'. They aren't self-linked by things like parentid, they're flattened.
That's embarassing at best. Anyway the OP sounded a bit strange. No SQL can index by delimited dots. Relational DB are good at handling relational entities not parsing string based logic. Can OP abstract that data in that way?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.