15

I have a requirement wherein I have to create hashvalue which consist of all columns of a table. With Checksum this can be done easily, but Checksum is not recommended as per Microsoft:

If at least one of the values in the expression list changes, the list checksum will probably change. However, this is not guaranteed. Therefore, to detect whether values have changed, we recommend the use of CHECKSUM only if your application can tolerate an occasional missed change. Otherwise, consider using HashBytes instead. With a specified MD5 hash algorithm, the probability that HashBytes will return the same result, for two different inputs, is much lower compared to CHECKSUM.

HASHBYTES accepts only 2 parameters (algorithm type, column)

Now the problem is even though HASHBYTES is more reliable compared to checksum but there doesn't seem to be an easy way to create it on multiple columns.

An example in the checksum,

create table dbo.chksum_demo1
(
    id int not null,
    name varchar(25),
    address varchar(250),
    HashValue as Checksum (id,name,address)
    CONSTRAINT PK_chksum_demo1 PRIMARY KEY (Id)
)

How can we do the above using Hashbytes instead of checksum?

2

4 Answers 4

12

One method is concat() the fields together along with a delimiter. For any dates format them to strings manually to control the formatting.

 HashValue as HASHBYTES('SHA2_256', CONCAT(ID,'|',name,'|',address)) 

The delimiter is needed to handle empty fields so ID:1 Name:'' Address:'12' is different from ID:1 Name:'12' Address:''.

Sign up to request clarification or add additional context in comments.

2 Comments

Why is the delimiter necessary, or is it optional?
After concatenating, the hash may be the same for two different rows. Think Col A = 1 and B = 10 vs A = 11 and B = 0. They will both boil down to HASHBYTES('SHA2_256', '110'), versus with a delimiter, HASHBYTES('SHA2_256', '1|10') and HASHBYTES('SHA2_256', '11|0')
11

Use this:

SELECT *,    
HASHBYTES('MD5', (SELECT ID, name, address FOR XML RAW))
FROM Table1

1 Comment

Do either XML or Microsoft give any guarantees upon the stability of the output of "FOR XML RAW"? While an hash might generate collisions (all hashes do, as a mapping from a bigger to a smaller cardinality), the same input should always produce the same output. Some things that in my mind might change in the output, while giving an equivalent XML document (and thus being a valid and correct output of the query), are the ordering of attributes, spacing, quoting, etc.
11
SELECT HASHBYTES('<algorithm>', CONCAT_WS('|', f1, f2, f3, f4 ...))
FROM Table1

algorithm>::= MD2 | MD4 | MD5 | SHA | SHA1 | SHA2_256 | SHA2_512

Comments

4

Copying the best suggestion from the supplied link here, and adding a where to show it can be used:

select MBT.refID,
hashbytes(
    'MD5',
    (select MBT.* from (values(null))foo(bar) for xml auto)
) as [Hash]
from MyBaseTable as MBT
where MBT.SomeFilter='X'

https://www.sqlservercentral.com/forums/topic/suggestionsolution-for-using-hashbytes-across-entire-table

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.