Python how to get null padded byte string from unicode string

Question

Im sure that someone should be able to help me here, as it feels like such a simple answer, but i can't find it anywhere. I need to write a unicode string (null padded ascii basically), but it isn't working as expected, no matter what i try from the internets, it ends up as pure ascii.

with open('test.txt', 'wb') as oFile:
    name = u'AAA'
    oFile.write(name)  //always writing 0x414141 i want 0x410041004100

Just to clarify, though the question is answered already, in case someone wanders here, the use case is it is a mixed binary file (an int here, a unicode string there, a struct, etc) and I am editing in place. I really just wanted to be able to write the string the way it is represented in the file ('AAA' as 0x410041004100 instead of 0x414141)

Wouldn't null-padded 0x41 be 0x0041 instead of 0x4100? — MattDMo
– MattDMo, Commented Jan 5, 2015 at 19:35
What do you mean by "a unicode string (null padded ASCII basically)". How is unicode the same as null padded? Do you mean you want to encode it as UTF-16? — BrenBarn
– BrenBarn, Commented Jan 5, 2015 at 19:37
honestly i hardly use unicode so im hardly sure, really what i am looking to do is turn the ascii string into the null padded format I always see, i apologize for not knowing if its utf8 or utf16... what does windows typically use? — Ryan
– Ryan, Commented Jan 5, 2015 at 19:42
I can do it by hand, but i thought i could get the null padded string naturally from a function already — Ryan
– Ryan, Commented Jan 5, 2015 at 19:44
Windows uses UTF-16 little-endian internally. That's consistent with what you say you want for a result. It's not commonly found in text files. — Mark Ransom
– Mark Ransom, Commented Jan 5, 2015 at 19:46

Greg Hewgill · Accepted Answer · 2015-01-05 19:46:13Z

2

You can use the .encode() method with an appropriate codec:

>>> name = u"aaa"
>>> name.encode("utf_16")
'\xff\xfea\x00a\x00a\x00'

The \xff\xfe at the beginning is a Byte Order Mark (BOM). Your application may or may not require that, and you can remove it if not needed.

answered Jan 5, 2015 at 19:46

Greg Hewgill

1.0m192 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ryan Over a year ago

Awesome thanks! Appreciate the BOM portion as well :D

Mark Ransom Over a year ago

You almost certainly will want to remove the BOM if it's not the very first write to the file.

Mark Tolonen Over a year ago

Or just use utf-16le/utf-16be. They don't add a BOM.

augurar · Accepted Answer · 2015-01-05 19:54:32Z

0

You can use the codecs module to specify an encoding when you open the file:

import codecs
with codecs.open('test.txt', 'wb', encoding='utf-16') as oFile:
    ...

Further information:

answered Jan 5, 2015 at 19:54

augurar

13.3k7 gold badges60 silver badges70 bronze badges

1 Comment

Ryan Over a year ago

I know i didn't specify, but this use case is not good for me, I actually have the file open already and it is a mixed binary file, so to do it this way i would need to close and reopen it multiple times... but thank you :)

Collectives™ on Stack Overflow

Python how to get null padded byte string from unicode string

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related