1

For a Python 3 programming assignment I have to work with Huffman coding. It's simple enough to generate the correct codes which result in a long string of 0's and 1's.

Now my problem is actually writings this string of as binary and not as text. I attempted to do this:

result = "01010101 ... " #really long string of 0's and 1's
filewrt = open(output_file, "wb") #appending b to w should write as binary, should it not?
filewrt.write(result)
filewrt.close()

however I'm still geting a large text file of 0 and 1 characters. How do I fix this?

EDIT: It seems as if I just simply don't understand how to represent an arbitrary bit in Python 3.

Based on this SO question I devised this ugly monstrosity:

for char in result: 
    filewrt.write( bytes(int(char, 2)) )

Instead of getting anywhere close to working, it outputted a zero'd file that was twice as large as my input file. Can someone please explain to me how to represent binary arbitrarily? And in the context of creating a huffman tree, how do I go about concatinating or joining bits based on their leaf locations if I should not use a string to do so.

8
  • 1
    result is a Unicode string that happens to contain 0 and 1. Writing it to a binary output stream is a type error. Are you sure you aren't running it under Python 2? Commented Nov 17, 2013 at 23:16
  • You need to convert the zeros and ones back to bytes first; Python doesn't do that for you. Commented Nov 17, 2013 at 23:19
  • @Mechanicalsnail Pretty sure. I explicitly defined Python3 in aptana and I have been using it for this semester, so hopefully I am. Commented Nov 17, 2013 at 23:45
  • @MartijnPieters I see. Does it take a specific "byte" object or "byte array"? what type of instance do I need to manually convert this string into? Commented Nov 17, 2013 at 23:46
  • 1
    The bytes type is used to represent an arbitrary byte array. That could either be a string, in which case .decode() will decode it as a unicode string, or just anything else. In any case, you can operate on it using binary operations and access individual bytes using indexing. If you want to store structures as binary objects (e.g. tree nodes), you can also use pickle to do the conversion between bytes and Python objects. Commented Nov 18, 2013 at 1:27

1 Answer 1

1
def intToTextBytes(n, stLen=0):
    bs = b''
    while n>0:
        bs = bytes([n & 0xff]) + bs
        n >>= 8
    return bs.rjust(stLen, b'\x00')


num = 0b01010101111111111111110000000000000011111111111111
bs = intToTextBytes(num)
print(bs)
open(output_file, "wb").write(bs)

EDIT: A more complicated, but faster (about 3 times) way:

from math import log, ceil
intToTextBytes = lambda n, stLen=0: bytes([
    (n >> (i<<3)) & 0xff for i in range(int(ceil(log(n, 256)))-1, -1, -1)
]).rjust(stLen, b'\x00')
Sign up to request clarification or add additional context in comments.

2 Comments

I can't actually use this for my assignment (plagiarism and all that) but this is definitely something I'll utilize in my own projects. Thank you!
You can use it wherever you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.