0

I have a file consisting in three parts:

  1. Xml header (unicode);
  2. ASCII character 29 (group separator);
  3. A numeric stream to the end of file

I want to get one xml string from the first part, and the numeric stream (to be parsed with struct.unpack or array.fromfile).

Should I create an empty string and add to it reading the file byte by byte until I find the separator, like shown here?

Or is there a way to read everything and use something like xmlstring = open('file.dat', 'rb').read().split(chr(29))[0] (which by the way doesn't work) ?

EDIT: this is what I see using a hex editor: the separator is there (selected byte)

enter image description here

11
  • 1
    In what way does .split(29) not work? Does it produce an error message? Please provide a short, complete program that demonstrates the error you are having. Commented Apr 7, 2015 at 18:11
  • Can you show an sample input and expected output of your file? Commented Apr 7, 2015 at 18:11
  • It would be a bit difficult for me to create code right now (I am already receiving the file generated elsewhere). Commented Apr 7, 2015 at 18:13
  • The code you have pasted works fine for me. In what way does it not work for you? Commented Apr 7, 2015 at 18:15
  • @Robᵩ it returns the whole file, not just the part before chr(29) . Commented Apr 7, 2015 at 18:16

3 Answers 3

1

Your attempt at searching for the value chr(29) didn't work because in that expression 29 is a value in decimal notation. The value you got from your hex editor however is displayed in hex, so it's 0x29 (or 41 in decimal).

You can simply do the conversion in Python - 0xnn is just another notation for entering an integer literal:

>>> 0x29
41

You can then use str.partition to split the data into your respective parts:

with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

Demonstration:

import random

SEP = chr(0x29)


with open('file.dat', 'wb') as outfile:
    outfile.write("<doc></doc>")
    outfile.write(SEP)
    data = ''.join(chr(random.randint(0, 255)) for i in range(1024))
    outfile.write(data)


with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

print xml
print len(binary_data)

Output:

<doc></doc>
1024
Sign up to request clarification or add additional context in comments.

7 Comments

This gives me the whole file as first element, and two aditional empty strings
Then your file simply does not contain the ASCII character 29 - might it be that 29 is in hex notation instead of decimal? Try chr(0x29) as the separator instead like in my updated answer.
The byte is there, see my attached screencapture.
@Tui Popenoe you don't say... My suggestion was that 29 was already the hex representation, and therefore the correct value to search for would be decimal 41. If you just substitute decimal 29 with 0x1d you don't change a thing.
@heltonbiker yes, the hex editor displays the values in hex, so chr(0x29) or chr(41) is the correct value to search for.
|
1

mmap the file, search for the 29, create a buffer or memoryview from the first part to feed to the parser, and pass the rest through struct.

2 Comments

Would it be better than simply reading one byte at a time until finding the separator byte, or else loading the whole file to a StringIO and performing the same search in memory?
A mmapped file exists as a byte array in the file cache; either of those options will be both slower and less flexible.
1

Make sure you are reading the file in before trying to split it. In your code, your don't have a .read()

with open('file.dat', 'rb') as f:
    file = f.read()
    if chr(29) in file:
        xmlstring = file.split(chr(29))[0]
    elif hex(29) in file:
        xmlstring = file.split(hex(29))[0]
    else:
        xmlstring = '\x1d not found!'

Ensure that a ASCII 29 char exists in your file (\x1d)

2 Comments

Thanks, there was a typo in my sample code. I was already doing this, but it didn't work as expected.
Regarding your last phrase, the group separator byte can be seen in a HexEditor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.