3

so, I am editing a word document, using OpenXML. And for some reasons, I convert it all into a string:

//conversion du byte en memorystream
using (var file = new MemoryStream(text))
using (var reader = new StreamReader(file))
{
    WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true);
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();
    }
}

And then, I convert it as a byte.

But, a simple convert will not work:

byte[] back2Byte = System.Text.Encoding.ASCII.GetBytes(docText );

Because the string is a open xml string.

Tried this, but always got a corrupted file when I tried to open it with Word:

var repo = new System.IO.MemoryStream(System.Text.Encoding.UTF8.GetBytes(docText));

byte[] buffer = new byte[16 * 1024];
MemoryStream ms = new MemoryStream();

int read;
while ((read = repo.Read(buffer, 0, buffer.Length)) > 0)
{
    ms.Write(buffer, 0, read);
}

byte[] back2Byte = ms.ToArray();

So, this doesn't work either:

byte[] back2Byte = new byte[docText.Length * sizeof(char)];
System.Buffer.BlockCopy(docText.ToCharArray(), 0, back2Byte, 0, back2Byte.Length);

edit : After some checkings, it seems it is write as a openxml document into the database, and so, word cannot read it. There is no error when i open it with notepad

How can I correct this?

So, the real issue is, how can I convert a OpenXML string to a byte that can be open in word?

9
  • Will a byte array serve your purposes? Commented May 5, 2014 at 14:08
  • Yes, because I stored it into a DB as blob, so a byte array in c#. Commented May 5, 2014 at 14:09
  • I suspect that you're not encoding the data properly in reading the stream into docText. What does that string look like? Strings can't store arbitrary data unless you use an encoding designed for that, like base64. See haacked.com/archive/2012/01/30/… Commented May 5, 2014 at 14:12
  • @TimS. The string is an openXml format, and it seems it is the issue, because the byte is written with xml format, so word cannot open it. So, the real issue is, how can I covnert a openxml string to a byte that can be open in word? Commented May 5, 2014 at 14:38
  • This is all kinds of wrong. You cannot encode a Unicode string as an ASCII sting. It's impossible. There is no conversion that would allow that. And it's foolish to try and change your data so it fits your storage system. Change your storage system so it fits your data. You also need to move away from the idea of bytes - you are dealing with characters here. The database will support the notion of characters, just use the right data type. Commented May 5, 2014 at 14:57

1 Answer 1

1

You cannot do this sort of thing. You are getting the bytes for only one part of an OpenXML document. By definition, all Microsoft Office documents are multi-part OpenXML documents. You could theoretically capture the bytes for all the parts using a technique like you're currently using, but you would also have to capture all the part/relationship information necessary to reconstruct the multi-part document. You'd be better off just reading all the bytes of the file and storing them as-is:

// to read the file as bytes
var fileName = @"C:\path\to\the\file.xlsx";
var fileBytes = File.ReadAllBytes(fileName);

// to recreate the file from the bytes
File.WriteAllBytes(fileName, fileBytes)

If you need a string form of those bytes, try this:

// to convert bytes to a (non-readable) text form
var fileContent = Convert.ToBase64String(fileBytes);

// to convert base-64 back to bytes
var fileBytes = Convert.FromBase64String(fileContent);

Either way, there is absolutely no need to use the OpenXML SDK for your use case.

Sign up to request clarification or add additional context in comments.

6 Comments

But I cannot convert to a non readable text, because I must replace some text in it. In fact, actually, I have a document, store as a blob in my db, I get it as byte[] then, conevrt it to a readable string to make changes in the text, and then, reconvert it to bytes, and restore in database. But according to Tomalak, it is not a good way to do it(blob and bytes...). i just heard of openxml, and though I could do what I explained more easyly.
You want to replace some text within the body of the document? OK, you can do that, but you need to use the OpenXML SDK to extract the string and write it back to a WordProcessingDocument when you are done. You cannot store only a single part of an OpenXML document and expect it to work.
And when it is rewrite into the WordProcessinDocument, I must restore it into my database. Can i do this?
When you programmatically modify an OpenXML document, the changes are automatically saved into the underlying file/stream (unless AutoSave is false) when the document is disposed. Add a using clause around the wordDoc variable. After that using block, the MemoryStream referenced by your file variable has the modified content. Just seek to the beginning of that stream and read out the bytes.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.