1

Im sure that someone should be able to help me here, as it feels like such a simple answer, but i can't find it anywhere. I need to write a unicode string (null padded ascii basically), but it isn't working as expected, no matter what i try from the internets, it ends up as pure ascii.

with open('test.txt', 'wb') as oFile:
    name = u'AAA'
    oFile.write(name)  //always writing 0x414141 i want 0x410041004100

Just to clarify, though the question is answered already, in case someone wanders here, the use case is it is a mixed binary file (an int here, a unicode string there, a struct, etc) and I am editing in place. I really just wanted to be able to write the string the way it is represented in the file ('AAA' as 0x410041004100 instead of 0x414141)

6
  • Wouldn't null-padded 0x41 be 0x0041 instead of 0x4100? Commented Jan 5, 2015 at 19:35
  • What do you mean by "a unicode string (null padded ASCII basically)". How is unicode the same as null padded? Do you mean you want to encode it as UTF-16? Commented Jan 5, 2015 at 19:37
  • honestly i hardly use unicode so im hardly sure, really what i am looking to do is turn the ascii string into the null padded format I always see, i apologize for not knowing if its utf8 or utf16... what does windows typically use? Commented Jan 5, 2015 at 19:42
  • I can do it by hand, but i thought i could get the null padded string naturally from a function already Commented Jan 5, 2015 at 19:44
  • 2
    Windows uses UTF-16 little-endian internally. That's consistent with what you say you want for a result. It's not commonly found in text files. Commented Jan 5, 2015 at 19:46

2 Answers 2

2

You can use the .encode() method with an appropriate codec:

>>> name = u"aaa"
>>> name.encode("utf_16")
'\xff\xfea\x00a\x00a\x00'

The \xff\xfe at the beginning is a Byte Order Mark (BOM). Your application may or may not require that, and you can remove it if not needed.

Sign up to request clarification or add additional context in comments.

3 Comments

Awesome thanks! Appreciate the BOM portion as well :D
You almost certainly will want to remove the BOM if it's not the very first write to the file.
Or just use utf-16le/utf-16be. They don't add a BOM.
0

You can use the codecs module to specify an encoding when you open the file:

import codecs
with codecs.open('test.txt', 'wb', encoding='utf-16') as oFile:
    ...

Further information:

1 Comment

I know i didn't specify, but this use case is not good for me, I actually have the file open already and it is a mixed binary file, so to do it this way i would need to close and reopen it multiple times... but thank you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.