1

I am getting some binary text data that can be in different file formats. How can I save this to a "real" file.

I tried using the BinaryWriter but when opening the file it is not correct, I get an encoding error. I do set the encoding.

https://learn.microsoft.com/en-us/dotnet/api/system.io.binarywriter?redirectedfrom=MSDN&view=net-6.0

I can provide code later if needed, but I am not sure If the binarywriter is the correct class for this.

Below is what the binary string looks like for a word document (truncated)

------=_Part_174495_1036280534.1637933726817
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Content-Transfer-Encoding: binary
Content-Disposition: attachment; filename="Dummy_attachment_Ariba.docx"
Content-ID: <000D3A2BB3F41EEC928A7BA5E05A5B2C>
    PK    ! ?b\?x  ?  [Content_Types].xml ?(?                                                                    !jgs6?,+??v????Sz???*a???? ????b4?y??4?m????q?J3??R?p?Hj?^?w? ~=?p?,??+6=@!V??-??I?????????h)??|m???I?H??K??50~4??|??^h4A+H?"?(??o\P\9?*I???9??BKh???NB?4??dm?????3?????D??8w"l`??'?N??9????u'X????s?D17????M?sx6???T$uN??6[?õ??R?ta??I??d}????
    ?o??*?+??m????Of?  ?? PK    ! ?U~?   ?  _rels/.rels ?(?                                                    ??MK1???!?;?*"??^D?Md?C2????????(?.??3y??3C???+?4xW??(A??????yX?JB???Wp????b??#InJ????*?E?b?=[J???M?%???a ??????9m?.?????3???Y?  ?? PK    ! ??f1?  ?b?R???1?EF7Z?n???hY?jy??#1'?<???7
     word/document.xml??[o?0??'?? ?[CBsAM???=L???yr?V?E????C?Tt?/??|????????I??????? 2a"]??~~?????X$8??.?#5?????"N$?s*a?B???Y?b??(??3???[{M$Gr?e??B???0(??????8`?p?-? e?????e?Cn???? D8
???U    r^u@? x?!?#??di?%M???]?l?SN?[?RQ?[?9???)?X???
?
?'??^?????">?_5??????5?????:e?H?r!??jv8J???????Z?Pa????iU???q???W??O?+??F^?=?P???A?9Kn?? ??`BX??U6!?<?z??#o?z??U??{????h??_?[????w???3?Vp$pK??x??GPC??W???ªxn??Kx*ldrt???????i4~??v???h~?oWt???=?)1k?]5?Hp???G??y=?N?U~??@l??j?????b???{?6??J?J??????,W?V`Y??$?`?????"i$+????n??_B???.&85?p??"??2*?*???J8??(*=?,?l??Hk%o?9??f'?N???n??g?to?nG??|?   ?d?axW>iW=q?]3K?????????
  9  word/_rels/document.xml.rels ?(?                                                                            ???N?0??H???w?@A?N/?R??M6?"YG???c??PE=??c???Zu??@?C
?(?????J?[??y?XS?[C?`@???j???f???w»?SP3?OR???N???H??4???G[?^??B???SHO<YP`??-?l??oS?M??&?wH|&B~??????BV0#?<?CH???? 
8
  • Which encoding error? Commented Nov 26, 2021 at 16:58
  • 3
    What is binary text? - sounds like a contradiction. Commented Nov 26, 2021 at 16:58
  • You can't know if a binary file is correct by seeing, except you make it char by char (or even not) or you are Neo. Focus on the error. Commented Nov 26, 2021 at 16:59
  • @jps I have no idea :-) what would you call it? I get the so called "text" in a POST multipart request Commented Nov 26, 2021 at 17:02
  • 1
    @ThomasAdrian that would be the payload, body, or any other term. It isn't "text" unless it is known to be text via some known text encoding (often UTF8) Commented Nov 26, 2021 at 17:06

2 Answers 2

1

BinaryWriter is almost never the right tool for any job - it doesn't do what people usually think. What you probably want here is simply: a Stream (i.e. File.Create(...). You would obtain the data from ... wherever it is coming from, and use the various Write APIs to append it, usually in chunks.

If the data is not known to be encoded text, then any moment you have string or char[] (or similar) data: you've corrupted it, so: don't do that. Stay purely in binary.

If the data is known to be encoded text, but you don't know the precise encoding used, then frankly: treat it as binary.

Anything more than that: would require specific examples of what you're doing.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks, will provide some code during the weekend
0

The question is really what encoding it is. may it's already corrupted. Maybe its the wrong way but cause it is a word file i would try to bruteforce it by checking all possible encodings und try to open the files with an word api and maybe one will work or fail but wouldn't take that long

var encodings = Encoding.GetEncodings().ToList();

encodings.ForEach(encoding =>
{
    File.WriteAllBytes($"{encoding.Name}.docx", Encoding.GetEncoding(encoding.Name).GetBytes(data));
});


encodings.ForEach(encoding =>
{
    try
    {
        /*to do: open $"{encoding.Name}.docx" with an word api*/
        Console.WriteLine($"{encoding.Name} works");
    }
    catch { }
});
Console.WriteLine("finished");
Console.ReadKey();

In case you have control over the sending part use base64 worked usually fine for me for http requests. But If I understand it right it is not the case

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.