2

Possible Duplicate Converting byte array to string and back again in C#

I am using Huffman Coding for compression and decompression of some text from here

The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.

For my situation, i need to get the compressed content, store it and decompress it when ever need.

The output from the encoder and the input to the decoder are BitArray.

When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.

Tree huffmanTree = new Tree();
huffmanTree.Build(input);

string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);

// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
    Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();

// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);

// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);

// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);

// Convert bytes back to bit array
BitArray todecode = new BitArray(e);

string decoded = huffmanTree.Decode(todecode);

Console.WriteLine("Decoded: " + decoded);

Console.ReadLine();

The Output of Original code from the tutorial is:

enter image description here

The Output of My Code is:

enter image description here

Where am I wrong friends? Help me, Thanks in advance.

3
  • Keep in mind that C# strings are UTF-16. Commented Feb 3, 2013 at 8:34
  • I tried ASCII with same weird answer. I tried Unicode (UTF-16) but give half right and half junk answer. Like Welcome gives WelcomESTT Commented Feb 3, 2013 at 8:38
  • The issue is that CopyTo assumes LSB-first, whereas everyone other sane API in the world uses MSB-first. Thus 10000000b is equal to 1 instead of 128 Commented Mar 13, 2018 at 22:38

2 Answers 2

4

You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.

string output = Encoding.UTF8.GetString(e);

e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.

Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.

Sign up to request clarification or add additional context in comments.

Comments

0

I'm pretty sure Encoding doesn't roundtrip - that is you can't encode an arbitrary sequence of bytes to a string, and then use the same Encoding to get bytes back and always expect them to be the same.

If you want to be able to roundtrip from your raw bytes to string and back to the same raw bytes, you'd need to use base64 encoding e.g.

http://blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/22/how-to-encoding-and-decoding-base64-strings-in-c.aspx

2 Comments

But base64 gives 4/3 of the input instead of compressing it. I need compression not encoding(SRC: wikipedia.org)
@GopikrishnaS Then you need to use byte[], not string. string is for character data, not binary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.