5

I have an Oracle background, and using "Indexed organized tables" (IOT) for every table sounds unreasonable in Oracle and I never actually seen this. In SQL Server, every database I worked on, has a clustered index on every table, which is the same as IOT (conceptually).

Why is that? Is there any reason for using clustered index everywhere? Seems to me like they would be good only for a handful of cases.

Thanks

3
  • 4
    Here is a related question on DBA-SE with some info and a couple of links where you can read on. Performance of Non Clustered Indexes on Heaps vs Clustered Indexes Commented Apr 15, 2012 at 12:41
  • 4
    Probably a question best answered by someone familiar with both Oracle and SQL Server. Database Administrators might be a better location for this. Commented Apr 15, 2012 at 13:33
  • 1
    Also, suggest you move this question to dba.se. It's had two comments and a (purely coincidental) answer from DBA.SE regulars without any of the other posters actually picking up that clustered indexes and IOTs actually have significant differences. Commented Apr 15, 2012 at 16:20

4 Answers 4

6

A clustered index is not quite the same thing as an index-organised table. With an IOT, every field must participate in the IOT key. A clustered index on SQL Server does not have to be unique, and does not have to be the primary key.

Clustered indexes are widely used on SQL Server, as there is almost always some natural ordering that makes a commonly used query more efficient. IOTs in Oracle carry more baggage, so they aren't quite as useful, although they may well be more useful then they're commonly given credit for.

Historically, really old versions of SQL Server pre 6.5 or 7.0 IIRC did not support row-level locking and could only lock at a table or page level. Often a clustered index would be used to ensure that writes were scattered around the table's physical storage to minimise contention on page locks. However, SQL Server 6 went of support some years ago, so applications with this issue will be restricted to rare legacy systems.

Sign up to request clarification or add additional context in comments.

2 Comments

I generally don't mind clustered indexes on dimension table (Small tables). In fact tables however, I'm not sure it's a good idea, it slows down the loading, and the full scanning. And in almost all the cases, the natural ordering is time based, generally the order in which the data was loaded.
@Younes - clustered indexes aren't really much use on a fact table as most queries will involve a table scan. Maybe with a version that doesn't support partitioning (e.g. 2012 B.I. edition) you might want to use a clustered index on the date or period column to minimise the I/O on load or archiving operations. Queries with date ranges may also be able to use the clustered index to cut down on I/O by using range scan operations.
2

Without a clustered index, your table is organized as a heap. This means that every row that is insert is added at the data page at the end of the table. Also as rows get updated, they get moved to the data page at the end of the table if the data updated is larger than than before.

When it is good to not have a clustered index

If you have a table that needs the fastest possible inserts, but can sacrifice update, and read speed, then not having a clustered index may work for you. One example would be if you had a table that was being used as a queue, for instance, lots of inserts that later just get read and moved to a different table.

Clustered Indexes

Clustered indexes organize the data in your table based on the columns in the clustered index. If you cluster on the wrong thing for instance a uniqueidentifier this can slow things down (see below).

As long as your clustered index is on the value that is most commonly used for searching, and it is unique and increasing they you get some amazing performance benefits out of the clustered index. For instance if you have a table called USERS where you are commonly looking up user data based on USER_ID then clustering on USER_ID would speed up the performance of all of those lookups. This simply reduces the number of data pages that need to be read to get at your data.

If you have too many keys in your clustered index this can slow things down also.

General rules for clustered indexes:

Don't cluster on any varchar columns.

Clustering on INT IDENTITY columns is usually best.

Cluster on what you commonly search on.

Clustering on UniqueIdentifiers

With uniqueidentifiers in an index, they are extremely inefficient because there is no natural sort order. Based on the b-tree structure of the index you end up with extremely fragmented indexes when using uniqueidentifiers. After rebuilding or reorganizing, they are still extremely fragmented. So you end up with a slower index, that ends up being really huge in memory and on disk due to the fragmentation. Also on inserts of the uniqueidentifier you are more likely to end up with a page split on the index thus slowing your insert. Generally uniqueidentifiers are bad news for indexes.

Summary

My recommendation is that every table should have a clustered index on it unless there is a really good reason not to (ie table functioning as a queue).

1 Comment

This confirms my understanding of clustered indexes. I can understand having indexes on lookup tables with a finite numbers of rows. Fits the bill. And basically, heap for fact tables that keeps growing, and that are ordered naturally when inserted. The one that puzzled me all the time is the one you describe by "Clustering on UniqueIdentifiers", I inherited a database with one of those on a 2B rows table an growing! It never made sense to me! On top of that, It had an automated job to rebuild it. Thanks, many things starts makes sense now.
1

I wouldn't know why you would prefer a heap over a clustered index most of the time. Using clustering, you get one index of your choice for free. Most of the time this is the primary key (which you probably want to enforce anyway!).

Heaps are mostly for special situations.

Comments

0

We are using Primary Keys in relational databases and in general relation is established via these primary keys. Most people used to name first field as TableID and make it primary key. When you join two ore more tables in your query you will get the fastest result if you use clustered indexes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.