Encoding issue when retrieving blob from SQL Server with Entity Framework

Question

I've noticed a peculiarity when trying to store 2003 MS word documents within a SQL Server blob (nvarchar(max)) field and then retrieve them using Entity Framework (EF1) to then convert them back into a file (which is then attached to a email).

Basic Code:

1) I insert the document into the database using the openrowset:

INSERT INTO [dbo].[Attachment]([id],[Blob])     
   SELECT 1, (SELECT BulkColumn FROM OPENROWSET(Bulk 'path_to_attachment\abc123.doc', SINGLE_BLOB) AS BLOB)

2) I then retrieve the file from the database using EF1 (simplified for brevity - bypassing repo etc):

var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var bytes = Encoding.Unicode.GetBytes(attachment.Blob);
var stream = new MemoryStream(bytes);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

Problem:

This works, but I noticed some discrepancies with the file after pushing the file through this process, some minor formatting issues etc.

After doing some deeper digging (using VBinDiff) - it looks like some of the Unicode chars are being converted to FDFF

00DC > FDFF

original:

00 00 00 00 00 00 00 00 00 DC 00 00 00 00 00 00

converted:

00 00 00 00 00 00 00 00 FD FF 00 00 00 00 00 00

other example are:

BED9 > FDFF
CFD9 > FDFF

which looks to be the last in the range from here: http://www.unicode.org/charts/PDF/UFB50.pdf

Questions

1) Am I doing something silly, or is EF doing something funky when returning the string object attachment.Blob - before I then try and convert it back to the byte array?

2) Is there a better way to retrieve the exact bytes out of the blob field while still using entity framework? (or should I use a stored procedure, or SqlDataReader instead - which I really don't want to do, in order not to convolute my data access paths)

Tr1stan · Accepted Answer · 2011-03-08 10:06:20Z

Mitch's answer helped point out the failing in my code. For some reason (force of habit I think) I'd set the Blob field to be nvarchar(max) when, as pointed out by Mitch, SINGLE_BLOB saves the file information as a varbinary(max) which is actually what I wanted within the application anyway (See question point 2).

Solution:

Convert the database field from nvarchar(max) to varbinary(max)
Update the Entity Framework model
Change the blob field inside the EF model from string to binary`

And finally, change this

var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var bytes = Encoding.Unicode.GetBytes(attachment.Blob);
var stream = new MemoryStream(bytes);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

To this:

var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault()
var stream = new MemoryStream(attachment.Blob);
var fileName = "abc123.doc";
var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

Mitch Wheat · Accepted Answer · 2011-03-06 23:39:00Z

1

Importing using SINGLE_BLOB returns the contents of a file as a single-row, single-column rowset of type varbinary(max).

Rather than SINGLE_BLOB, I would suggest using SINGLE_NCLOB which reads a file as nvarchar(max).

Unicode files must be read with the SINGLE_NCLOB option shown here:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'path_to_attachment\abc123.doc', SINGLE_NCLOB) AS BLOB

Ref.: Using OPENROWSET to read large files into SQL Server

Update (in response to comment): If the files are not unicode (as you tried) then when you retrieve them you should not use Unicode encoding to get the bytes:

var bytes = Encoding.ASCII.GetBytes(attachment.Blob);

edited Mar 6, 2011 at 23:39

answered Mar 6, 2011 at 3:32

Mitch Wheat

302k44 gold badges482 silver badges552 bronze badges

1 Comment

Tr1stan Over a year ago

@Mitch-Wheat: If I try and change the parameter to SINGLE_NCLOB I get the error message: SINGLE_NCLOB requires a UNICODE (widechar) input file. The file specified is not Unicode. - Thanks very much for your reply btw.

Collectives™ on Stack Overflow

Encoding issue when retrieving blob from SQL Server with Entity Framework

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related