16

I'm having an issue parsing data after reading a file. What I'm doing is reading a binary file in and need to create a list of attributes from the read file all of the data in the file is terminated with a null byte. What I'm trying to do is find every instance of a null byte terminated attribute.

Essentially taking a string like

Health\x00experience\x00charactername\x00

and storing it in a list.

The real issue is I need to keep the null bytes in tact, I just need to be able to find each instance of a null byte and store the data that precedes it.

0

4 Answers 4

11

Python doesn't treat NUL bytes as anything special; they're no different from spaces or commas. So, split() works fine:

>>> my_string = "Health\x00experience\x00charactername\x00"
>>> my_string.split('\x00')
['Health', 'experience', 'charactername', '']

Note that split is treating \x00 as a separator, not a terminator, so we get an extra empty string at the end. If that's a problem, you can just slice it off:

>>> my_string.split('\x00')[:-1]
['Health', 'experience', 'charactername']
Sign up to request clarification or add additional context in comments.

2 Comments

I forgot to say in my initial question I need to keep all of the nullbyte in place, I just need to be able to take the input and find the nullbyte, sorry I didn't clarify that initially
@user2806298: As justhalf implies, Python's str.split method doesn't have any way to keep the separators, but it's easy to just add them back on to each one. For example: [s+'\x00' for s in my_string.split('\x00')[:-1]].
10

While it boils down to using split('\x00') a convenience wrapper might be nice.

def readlines(f, bufsize):
    buf = ""
    data = True
    while data:
        data = f.read(bufsize)
        buf += data
        lines = buf.split('\x00')
        buf = lines.pop()
        for line in lines:
            yield line + '\x00'
    yield buf + '\x00'

then you can do something like

with open('myfile', 'rb') as f:
    mylist = [item for item in readlines(f, 524288)]

This has the added benefit of not needing to load the entire contents into memory before splitting the text.

2 Comments

Thanks for the help, the issue I have though is I forgot to say in my initial question I need to keep all of the nullbyte in place, I just need to be able to take the input and find the nullbyte, sorry I didn't clarify that initially
@user2806298 Edited to keep the nullbytes in place
6

To check if string has NULL byte, simply use in operator, for example:

if b'\x00' in data:

To find the position of it, use find() which would return the lowest index in the string where substring sub is found. Then use optional arguments start and end for slice notation.

Comments

1

Split on null bytes; .split() returns a list:

>> print("Health\x00experience\x00charactername\x00".split("\x00"))
['Health', 'experience', 'charactername', '']

If you know the data always ends with a null byte, you can slice the list to chop off the last empty string (like result_list[:-1]).

1 Comment

Yeah the extra slash present in error I forgot to say in my initial question I need to keep all of the nullbyte in place, I just need to be able to take the input and find the nullbyte, sorry I didn't clarify that initially

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.