12

I have a binary like this: 1101100110000110110110011000001011011000101001111101100010101000

and I want to convert it to utf-8. how can I do this in python?

7
  • What encoding is the binary string in? ASCII? Or you mean the bytes are a utf-8-encoded string and you want to get a unicode string in python? Commented Oct 8, 2013 at 18:52
  • What do you mean with "convert it to utf-8"? Create the characters from the binary octets? Commented Oct 8, 2013 at 18:53
  • 1
    the binary string is in utf-8 and yes, I want to get a unicode string in python. Commented Oct 8, 2013 at 18:55
  • I think we're not understanding precisely what sort of file you have. Could you run hd or od or a similar hex-dump utility and copy-paste the first few lines? Commented Oct 8, 2013 at 18:57
  • it's not a file. I just have a text in persian and I convert it to binary, now I want to convert it back to the text. Commented Oct 8, 2013 at 19:04

4 Answers 4

19

Cleaner version:

>>> test_string = '1101100110000110110110011000001011011000101001111101100010101000'
>>> print ('%x' % int(test_string, 2)).decode('hex').decode('utf-8')
نقاب

Inverse (from @Robᵩ's comment):

>>> '{:b}'.format(int(u'نقاب'.encode('utf-8').encode('hex'), 16))
1: '1101100110000110110110011000001011011000101001111101100010101000'
Sign up to request clarification or add additional context in comments.

15 Comments

but it doesn't work properly. it shows something else, not the first text I just converted to binary
worked, thank you. I think I should move the check to this answer. It's really simpler
And the inverse would be: s=u'نقاب'; print '{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))
@Robᵩ added to the answer with minor edit (I think in this case .encode('utf-8') in unnecessary).
Note that s = "سلام" and s = u"سلام" give different results. The former fails, the latter works. But let's stop solving the new problem. @Aidin.T, if you have a problem with encoding, please open a new question.
|
4

Well, the idea I have is: 1. Split the string into octets 2. Convert the octet to hexadecimal using int and later chr 3. Join them and decode the utf-8 string into Unicode

This code works for me, but I'm not sure what does it print because I don't have utf-8 in my console (Windows :P ).

s = '1101100110000110110110011000001011011000101001111101100010101000'
u = "".join([chr(int(x,2)) for x in [s[i:i+8] 
                           for i in range(0,len(s), 8)
                           ]
            ])
d = u.decode('utf-8')

Hope this helps!

4 Comments

Hmmm, I'm somewhat suspicious of unichr. Because OP says his binary is already utf-8. utf-8 has variable character length, so I just used chr to join the raw bytes in a string and decode them later into Unicode.
@JoranBeasley - I disagree, assuming Python2. In that step he is collecting bytes, not characters. Only after he has the utf-8-encoded byte string does he want to convert.
@Robᵩ: That's my point. Nice answer, love the split('........'). I think is basically the same idea as mine. +1
+1 - This is the same technique as mine (so obviously I approve), plus you explained yours. Questioner should move the check to this better answer.
3
>>> s='1101100110000110110110011000001011011000101001111101100010101000'
>>> print (''.join([chr(int(x,2)) for x in re.split('(........)', s) if x ])).decode('utf-8')
نقاب
>>> 

Or, the inverse:

>>> s=u'نقاب'
>>> ''.join(['{:b}'.format(ord(x)) for x in s.encode('utf-8')])
'1101100110000110110110011000001011011000101001111101100010101000'
>>> 

1 Comment

there is another question, how can I convert my text to binary by python? I mean the inverse form of my question
1

Use:

def bin2text(s): return "".join([chr(int(s[i:i+8],2)) for i in xrange(0,len(s),8)])


>>> print bin2text("01110100011001010111001101110100")
>>> test

2 Comments

for my text it returns this: '\xd9\x86\xd9\x82\xd8\xa7\xd8\xa8', how can I get it in the right way of showing?
You want unichr(), not just chr(). docs.python.org/2/library/functions.html#unichr

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.