2

I need to read a simple but large (500MB) binary file in Python 3.6. The file was created by a C program, and it contains 64-bit double precision data. I tried using struct.unpack but that's very slow for a large file.

Here is my simple file read:

def ReadBinary():

    fileName = 'C:\\File_Data\\LargeDataFile.bin'

    with open(fileName, mode='rb') as file:
        fileContent = file.read()

Now I have fileContent. What is the fastest way to decode it into 64-bit double-precision floating point, or read it without the need to do a format conversion?

I want to avoid, if possible, reading the file in chunks. I would like to read it decoded, all at once, like C does.

1

1 Answer 1

6

You can use array.array('d')'s fromfile method:

def ReadBinary():
    fileName = r'C:\File_Data\LargeDataFile.bin'

    fileContent = array.array('d')
    with open(fileName, mode='rb') as file:
        fileContent.fromfile(file)
    return fileContent

That's a C-level read as raw machine values. mmap.mmap could also work by creating a memoryview of the mmap object and casting it.

Sign up to request clarification or add additional context in comments.

5 Comments

I'll try that out now.
I get this message: 'array.array' has no attribute 'array'
That was because I had "from array import array" in my imports; when I changed to "import array" the problem was solved.
@RTC222: Yeah, I'm not a fan of the "module and only class in it share the same name" thing. In modern Python, they probably would have named the class Array (matching PEP8 for non-built-ins, like collections.OrderedDict), but we're stuck with legacy names forever, whee!
I don't like that either because it's confusing. I also prefer to import the whole module, not just a class (e.g. from xxx import yyy).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.