2

I am working on an ETL process (no primary key -> no transactional replication) between Oracle and SQL Server and using a MD5 hash to detect for differences between the source and the destination databases.

This works fine for those records where the data falls into the first 127 ASCII characters. But when there are any 'extended ascii'* characters such as ½, °, or © SQL Server's HASHBYTES function hashes these characters in a non-standard way (i.e. different from Oracle's DBMS_CRYPTO.Hash, the .Net Cryptographic library etc).

So when I run this in Oracle:

select rawtohex(
DBMS_CRYPTO.Hash (
    UTL_I18N.STRING_TO_RAW ('°', 'AL32UTF8'),
    2)
) from dual;

I get: 4723EB5AA8B0CD28C7E09433839B8FAE.

And when I run this in SQL Server:

SELECT HASHBYTES('md5', '°');

I get: EC655B6DA8B9264A7C7C5E1A70642FA7

And when I run this C# code:

string password = "°";

// byte array representation of that string
byte[] encodedPassword = new UTF8Encoding().GetBytes(password);

// need MD5 to calculate the hash
byte[] hash = ((HashAlgorithm) CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);

// string representation (similar to UNIX format)
string encoded = BitConverter.ToString(hash)
   // without dashes
   .Replace("-", string.Empty)
   // make lowercase
   .ToLower();

I get 4723EB5AA8B0CD28C7E09433839B8FAE i.e. the same as in Oracle and every online tool that I have used.

Is there any SQL-based solution to this problem or would I need to create a CLR stored procedure and hash the data there?


*I realise that the term is somewhat controversial

3
  • 1
    MD5 is MD5, if the output differs, the input differs - if the inputs look the same then its an encoding issue related to the binary interpretation of the input. Can you show the input and output of such a HASHBYTES() call along with your expected result? Commented Apr 29, 2016 at 15:20
  • 1
    SQL Server is returning you the hash of the string as interpreted as ISO-8859-1, HASHBYTES('md5', N'°'); would give you the UTF16 result if you can use that in Oracle? Commented Apr 29, 2016 at 15:37
  • 1
    Possible duplicate of TSQL md5 hash different to C# .NET md5 Commented Apr 29, 2016 at 16:29

2 Answers 2

3

There is no UTF-8 support in MS SQL Server as of now. Because of this, your hashes will always differ until you switch the source string to the most common denominator, in this case UTF-16 (probably).

Sign up to request clarification or add additional context in comments.

Comments

0

I have decided to around SQL Server's treatment of extended ASCII by implementing a CLR stored procedure that use the .Net cryptographic library:

using System;
using System.Security.Cryptography;
using System.Text;
using Microsoft.SqlServer.Server;

public class Functions
{
  [SqlFunction]
  public static string GetMD5Hash (string input)
  {
    var encodedPassword = new UTF8Encoding().GetBytes(input);

    var hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);

    return BitConverter.ToString(hash).Replace("-", string.Empty);
  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.