0

I'm working with large image datasets stored in a non-standard image format (.Tsm). Essentially it's a binary file with some headers at the start, very similar to FITS standard except stored in little-endian as opposed to FITS big-endian.

After reading the file header and formatting the metadata, I can read a single image using the following code

    def __read_slice(self, file, img_num, dimensions):
        """Read a single image slice from .tsm file"""

        pixel_range = self.metadata["pixel range"]
        bytes_to_read = self.metadata["bytes to read"]

        # position file pointer to correct byte
        file.seek(self.HEADER_TOTAL_LEN + (bytes_to_read * img_num), 0)

        all_bytes = file.read(bytes_to_read)  # read image bytes
        img = np.empty(len(pixels), dtype='uint16')  # preallocate image vector

        byte_idx = 0
        for idx, pixel in enumerate(pixel_range):
            img[idx] = (all_bytes[byte_idx + 1] << 8) + all_bytes[byte_idx]
            byte_idx += 2

        return np.reshape(img, (dimensions[1], dimensions[0]))  # reshape array to correct dimensions 

the trouble is the images can be very large (2048x2048) so even just loading in 20-30 frames for processing can take a significant amount of time. I'm new to python so i'm guessing the code here is pretty inefficient, especially the loop.

Is there a more efficient way to convert the byte data into 16bit integers?

1 Answer 1

2

You can try:

img= np.frombuffer(all_bytes, dtype='uint16')

Example:

>>> np.frombuffer(b'\x01\x02\x03\x04', dtype='uint16')
array([ 513, 1027], dtype=uint16)
Sign up to request clarification or add additional context in comments.

2 Comments

Wow, that was so much faster, thank you! I clearly have a lot to learn about buffered input. In general, is there a useful source anyone knows of to learn more about what's going on under the covers? I assumed this type of operation at the lowest level would be taken care of by some kind of for loop anyway
You'll have to read the documentation for any library you intend to use. Python in general targets correctness and expressiveness of code and not efficiency, and that is why extensive calculations needing a lot of CPU time are offloaded to native libraries (written in C, Fortran and other languages) leaving high-level concepts to Python.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.