2

I have a big binary file (60GB) that I want to split into several smaller. I iterated over the file and found the points at which I want to split the file using fileObject.tell() method, so now I have an array of 1000 split points called file_pointers. I am looking for a way to create files out of those split points, so the function would look like:

def split_file(file_object, file_pointers):
     # Do something here

and it would create files for every chunk. I saw this question, but I am afraid Python's looping could be too slow, and I also feel like there must be some kind of a built-in function that should something similar.

1 Answer 1

2

This is a lot simpler than I thought, but I will post my answer in here just in case anyone wants a quick solution. Here is an example of copying from file_pointer[1] to file_pointer[2]

with open('train_example.bson', 'rb') as fbson:
    fbson.seek(file_pointers[1])
    bytes_chunk = fbson.read(file_pointers[2] - file_pointers[1])
    with open('tmp.bson', 'wb') as output_file:
        output_file.write(bytes_chunk)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.