I've noticed a peculiarity when trying to store 2003 MS word documents within a SQL Server blob (nvarchar(max)) field and then retrieve them using Entity Framework (EF1) to then convert them back into a file (which is then attached to a email).
Basic Code:
1) I insert the document into the database using the openrowset:
INSERT INTO [dbo].[Attachment]([id],[Blob])
SELECT 1, (SELECT BulkColumn FROM OPENROWSET(Bulk 'path_to_attachment\abc123.doc', SINGLE_BLOB) AS BLOB)
2) I then retrieve the file from the database using EF1 (simplified for brevity - bypassing repo etc):
var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var bytes = Encoding.Unicode.GetBytes(attachment.Blob);
var stream = new MemoryStream(bytes);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);
Problem:
This works, but I noticed some discrepancies with the file after pushing the file through this process, some minor formatting issues etc.
After doing some deeper digging (using VBinDiff) - it looks like some of the Unicode chars are being converted to FDFF
00DC > FDFF
original:
00 00 00 00 00 00 00 00 00 DC 00 00 00 00 00 00
converted:
00 00 00 00 00 00 00 00 FD FF 00 00 00 00 00 00
other example are:
BED9 > FDFF
CFD9 > FDFF
which looks to be the last in the range from here: http://www.unicode.org/charts/PDF/UFB50.pdf
Questions
1) Am I doing something silly, or is EF doing something funky when returning the string object attachment.Blob - before I then try and convert it back to the byte array?
2) Is there a better way to retrieve the exact bytes out of the blob field while still using entity framework? (or should I use a stored procedure, or SqlDataReader instead - which I really don't want to do, in order not to convolute my data access paths)