2

I used Huffman encoding that we wrote to compress a file. The function takes String and its output is String.

The problem is I want to save it as binary to get lower size than the original size, but when I take it back (0's and 1's ) as a string its size is larger than the main file. How can I convert that string of (0's and 1's) to a binary so that every character is saved in 1 bit? I am using Qt to achieve this:

string Huffman_encoding(string text)
{
    buildHuffmanTree(text);

    string encoded = "";
    unordered_map<char, string> StringEncoded;
    encoding(main_root, "", StringEncoded);

    for (char ch : text) {
        encoded += StringEncoded[ch];
    }
    return encoded;
}
2
  • Does this answer your question? How to read/write arbitrary bits in C/C++ Commented Dec 17, 2021 at 12:34
  • This is not enough code to help you. Hoffman codes don't fit into bytes (they can for example be 3 bits or 5 bits). So you need kind of a bitstream and then write that bitstream to file. Commented Dec 17, 2021 at 12:35

4 Answers 4

2

The canonical solution uses a "bit packer" that accepts bitstrings and emits packed bytes. As a first start, replace encoded by an instance of the following:

class BitPacker {
  QByteArray res;
  quint8 bitsLeft = 8;
  quint8 buf = 0;

  public:
  void operator+=(const std::string& s) {
    for (auto c : s) {
      buf = buf << 1 | c - '0';
      if (--bitsLeft == 0) {
        res.append(buf);
        buf = 0;
        bitsLeft = 8;
      }
    }
  }

  QByteArray finish() {
    if (bitsLeft < 8) {
      res.append(buf << bitsLeft);
      buf = 0;
      bitsLeft = 8;
    }
    return res;
  }
}

operator+= will add additional bits to buf and flush complete bytes to res. At the end of the process you may be left with, say, 3 bits. finish uses a simple algorithm: it pads the buffer with zeroes to produce a final byte and hands you back the fully encoded buffer.

A more sophisticated solution might be to introduce an explicit "end of stream" token that is not present in the source character set.

Sign up to request clarification or add additional context in comments.

Comments

1

Seems what you're searching for is a way to convert a string containing a sequence of 0s and 1s like "0000010010000000" to an actual binary representation (numbers 4 and 128 in this example).

This could be achieved with a function like this:

#include <iostream>
#include <string>

#include <cstdint>
#include <vector>

std::vector<uint8_t> toBinary(std::string const& binStr)
{
    std::vector<uint8_t> result;
    result.reserve(binStr.size() / 8);
    size_t pos = 0;
    size_t len = binStr.length();
    while (pos < len)
    {
         size_t curLen = std::min(static_cast<size_t>(8), len-pos);
         auto curStr = binStr.substr(pos, curLen) + std::string(8-curLen, '0');
         std::cout << "curLen: " << curLen << ", curStr: " << curStr << "\n";
         result.push_back(std::stoi(curStr, 0, 2));
         pos += 8;
    }
    return result;
}

// test:
int main()
{
  std::string binStr("000001001000000001");
  auto bin = toBinary(binStr);
  for (auto i: bin)
  {
    std::cout << static_cast<int>(i) << "  ";
  }
  return 0;
}

Output:

4 128 64

You can then do whatever you want with these numbers, e.g. write them into a binary file.

Note that toBinary as above, pads the last byte, if incomplete, with zeros.

7 Comments

@AnoopRana had pressed the submit button a bit early ;). Hope it's all good now?
Yes now it looks good.
Thanks for Trying to help but iam not trying to do that , i want them to still be 0.1 but when i save them in a file they are saving as 0's 1's not as a string of 0's and 1's ,Because String Character takes 8 bits while i want to save it as binary to take only 1 bit
not sure I understand what you mean- you want to convert a string containing '0' and '1' characters (lets call that a bit string) to actual binary data, right? that's what my function is doing. the numeric format that I convert to here takes 8 times less memory than the "bit string". and everything is binary in a computer - even a string is stored in bits; so I don't know what you mean when you "still want them be 0.1". in the computer, both representations are stored as zeros and ones in the end...
i Used it but do u have any idead how to save it to a file using qt ? ,thx for help
|
1

You can create a bitstream using bitwise logic like this :

#include <cassert>
#include <string>
#include <stdexcept>
#include <vector>


auto to_bit_stream(const std::string& bytes)
{
    std::vector<std::uint8_t> stream;
    std::uint8_t shift{ 0 };
    std::uint8_t out{ 0 };

    // allocate enough bytes to hold the bits
    // speeds up the code a bit
    stream.reserve((bytes.size() + 7) / 8);

    // loop over all bytes
    for (const auto c : bytes)
    {
        // check input
        if (!((c == '0') || (c == '1'))) throw std::invalid_argument("invalid character in input");

        // shift output by one to accept next bit
        out <<= 1;

        // keep track of number of shifts
        // after 8 shifts a byte has been filled
        shift++;

        // or the output with a 1 if needed
        out |= (c == '1');

        // complete an output byte
        if (shift == 8)
        {
            stream.push_back(out);
            out = 0;
            shift = 0;
        }
    }

    return stream;
}


int main() 
{
    // stream is 8 bits per value, values 0,1,2,3
    auto stream = to_bit_stream("00000000000000010000001000000011");

    assert(stream.size() == 4ul);
    assert(stream[0] == 0);
    assert(stream[1] == 1);
    assert(stream[2] == 2);
    assert(stream[3] == 3);
        
    return 0;
}

Comments

1

Use std::stoi()

int n = std::stoi("01000100", nullptr, 2);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.