SQL Server HASHBYTES function returning un

Question

I'm using the HASHBYTES function in T-SQL to generate an MD5 hash of some data, but I am getting some unexpected results, even though hashing the same data. What am I doing wrong here?

For demonstration purposes I'll create a table and insert a random guid as the 'CustomerId' and a random email address as the 'EmailAddress'. The 'ConcatHash' is a computed column which should create an MD5 hash of the two columns joined together by the pipe character. So it's easier to see whats going on I have also added a ConcatColumn so you can see what the CONCAT_WS is doing.

CREATE TABLE dbo.CustomerTest
(
    CustomerId UNIQUEIDENTIFIER NOT NULL
  , EmailAddress VARCHAR(255) NOT NULL
  , ConcatColumn AS (CONCAT_WS('|', CustomerId, EmailAddress))
  , ConcatHash AS (HASHBYTES('MD5', CONCAT_WS('|', CustomerId, EmailAddress))) PERSISTED
)
GO

INSERT INTO dbo.CustomerTest
VALUES
('8E38101D-988E-4BF1-B8F1-E8E0B8DAA891', '[email protected]')
GO

SELECT * FROM dbo.CustomerTest

Here is the result...

I'll now query the same data from a different table, using CONCAT_WS and HASHBYTES in exactly the same way as I did previously.

SELECT CustomerId
     , Email
     , CONCAT_WS('|', CustomerId, Email)                   As ConcatColumn
     , HASHBYTES('MD5', CONCAT_WS('|', CustomerId, Email)) AS ConcatHash
FROM dbo.Customers
WHERE CustomerId = '8E38101D-988E-4BF1-B8F1-E8E0B8DAA891'

Here is the result...

Here are the results side-by-side, and you can see the data is the same, the concatanated data is the same, yet the MD5 is different...

To save you the trouble of looking at the 'ConcatColumn' column letter by letter, I have already verified they are identical. So why is the MD5 hash different?

Hash functions operate on bytes, not characters. Both columns need to be either NVARCHAR, or VARCHAR with identical collations. — Jeroen Mostert
– Jeroen Mostert, Commented May 25, 2021 at 21:57
@lptr: Yes, because for VARCHAR fields, collation also determines how characters are encoded. For NVARCHAR fields it does not, as they are always UTF-16. (OK, technically the collations do not need to be identical -- Latin1_General_CI_AS and Latin1_General_CI_AI encode the same because only accent sensitivity rules are different, for example. But, say, Japanese_ is quite different.) — Jeroen Mostert
– Jeroen Mostert, Commented May 25, 2021 at 22:15

AlwaysLearning · Accepted Answer · 2021-05-25 22:01:58Z

varchar and nvarchar columns do not produce the same hash results...

-- Setup demo data...
create table dbo.Customers1 (
  CustomerId varchar(255),
  Email varchar(255),
);
insert dbo.Customers1 (CustomerId, Email) values
  ('8E38101D-988E-4BF1-B8F1-E8E0B8DAA891', '[email protected]');

create table dbo.Customers2 (
  CustomerId varchar(255),
  Email nvarchar(255),
);
insert dbo.Customers2 (CustomerId, Email) values
  ('8E38101D-988E-4BF1-B8F1-E8E0B8DAA891', '[email protected]');

-- Query data...
SELECT CustomerId
     , Email
     , HASHBYTES('MD5', CONCAT_WS('|', CustomerId, Email)) AS ConcatHash
FROM dbo.Customers1
WHERE CustomerId = '8E38101D-988E-4BF1-B8F1-E8E0B8DAA891'

SELECT CustomerId
     , Email
     , HASHBYTES('MD5', CONCAT_WS('|', CustomerId, Email)) AS ConcatHash
FROM dbo.Customers2
WHERE CustomerId = '8E38101D-988E-4BF1-B8F1-E8E0B8DAA891'

Which yields...

CustomerId	Email	ConcatHash
8E38101D-988E-4BF1-B8F1-E8E0B8DAA891	[email protected]	0xB3CF062CD2FAB8601A1B58E53D1F705B

and...

CustomerId	Email	ConcatHash
8E38101D-988E-4BF1-B8F1-E8E0B8DAA891	[email protected]	0xFACC935D24A15B73B4F6B864D3BA536

Collectives™ on Stack Overflow

SQL Server HASHBYTES function returning un

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related