4

I created indexes in postgres for the below mentioned table using md5. The indexes and the table are given below:

 create table my_table(col1 character varying, col2 character varying, col3 character varying);

my_table looks like (I have just given an example. My actual table is of 1Tera Byte):

   col1  col2   col3
   <a12> <j178> <k109>

create index index1 on my_table (md5(col1), md5(col2), md5(col2));

I tried to create index without using md5, however I ended up getting the error:

ERROR:  index row size 2760 exceeds maximum 2712 for index "index1"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

However, I notice that my query processing time remains the same whether I had created index or not. I am confused as to what could be the reason. Can someone please help me with this?

The sql query which I have fired is of the form:

select col3 from my_table where col1='<a12>' and col2='<j178>';
2
  • md5? What do you think it means? Commented Oct 15, 2013 at 23:50
  • "My actual table is of 1Tera Byte" --- god save your database ))) Commented Oct 15, 2013 at 23:52

3 Answers 3

6

Since you are getting an error when you try to create a standard btree index I am guessing that the data in one or more of these columns is quite large.

The index you created is perhaps best described as "a b-tree index of the md5 hashes of the three columns", rather than an "md5 index of three columns".

In order for PostgreSQL to use the index, your query must be for md5 hashes. Try:

SELECT col3
FROM my_table 
WHERE
      md5(col1) = md5('<a12>')
  and md5(col2) = md5('<j178>')

The planner will say "oh, I have an index of md5(col1) etc, I'll use that". Note that this will only work for full equality queries (=) and not for LIKE or range queries. Also it won't get the value for col3 from the index because only the md5 of col3 is stored there, so it will still need to go to the table to get the col3 value.

For a small table this would probably result in the planner deciding to skip the index and just do a full scan on the table, but it sounds like your table is big enough that the index will be worthwhile - postgres will scan the index, find the row entries that match, then retreive those rows from the table.

Now if col3 is the one that has large hunks of data in it and cols 1 and 2 are small, you could just create a normal index of col1, col2. You really only need to index the columns in your where clause, not those in the select part.

The postgres indexes documentation is pretty good: http://www.postgresql.org/docs/9.0/static/indexes.html but the CREATE INDEX page is probably the single most useful one: http://www.postgresql.org/docs/9.1/static/sql-createindex.html

The best way to find out if your indexes are being used is to use the "EXPLAIN" instruction: http://www.postgresql.org/docs/9.1/static/sql-explain.html - if you use pgadmin3 to play with your DB (I highly recommend it) then just press F7 in the query window and it will do the explain and present it in a nice GUI showing you the query plan. This has saved many hours of hair-pulling trying to find out why my indices weren't being used.

Sign up to request clarification or add additional context in comments.

Comments

1

why are you applying MD5 while creating index ? you can take advantage of HASH index by storing the MD5 values of interested column and create HASH index on that column and also search by that hashed value. but I think you do not need all that.

I see you are creating B-TREE index you should just use regular column, w/o any Hash encryption. B-TREE index perform best when you have straight Operator in Where clause.

Comments

1

Create index for each column, not combined column. If you create index for several separated column, postgresql query planar can combine those using what it calls a bitmap index scan. Combining single column indexes is often as fast, and you can use them in any query that references a column that you indexed. Creating index for combined columns is not good design.

Reference Postgresql doc 11.5. Combining Multiple Indexes

About md5, I got posted without your refresh. using md5() is ok. Like other answers, you should also use md5() in where clause and need to add full data comparision to account for the possible collisions of hash.

And there are another possibility. Single column index can help you reduce index row size than multi column combined index.

2 Comments

"not composite value" --- could you elaborate that? How would postgresql use 2 separated index for matching 2 columns?
@zerkms postggresql can use 2 separated index for matching 2 columns. postgresql.org/docs/9.1/static/indexes-bitmap-scans.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.