For a Python 3 programming assignment I have to work with Huffman coding. It's simple enough to generate the correct codes which result in a long string of 0's and 1's.
Now my problem is actually writings this string of as binary and not as text. I attempted to do this:
result = "01010101 ... " #really long string of 0's and 1's
filewrt = open(output_file, "wb") #appending b to w should write as binary, should it not?
filewrt.write(result)
filewrt.close()
however I'm still geting a large text file of 0 and 1 characters. How do I fix this?
EDIT: It seems as if I just simply don't understand how to represent an arbitrary bit in Python 3.
Based on this SO question I devised this ugly monstrosity:
for char in result:
filewrt.write( bytes(int(char, 2)) )
Instead of getting anywhere close to working, it outputted a zero'd file that was twice as large as my input file. Can someone please explain to me how to represent binary arbitrarily? And in the context of creating a huffman tree, how do I go about concatinating or joining bits based on their leaf locations if I should not use a string to do so.
resultis a Unicode string that happens to contain0and1. Writing it to a binary output stream is a type error. Are you sure you aren't running it under Python 2?bytestype is used to represent an arbitrary byte array. That could either be a string, in which case.decode()will decode it as a unicode string, or just anything else. In any case, you can operate on it using binary operations and access individual bytes using indexing. If you want to store structures as binary objects (e.g. tree nodes), you can also usepickleto do the conversion betweenbytesand Python objects.