0

I'm looking for a way in pure python to export a numpy array to either a text file or a compressed image file. My array is 500x700 so exporting it produces a file with a size of ~3mb. I need it to be at least under 1mb.

I've tried the tifffile package but the compression modes are only from 1 to 9 so it is not enough.

My file could be in a "matrix" format, a X Y Z format, a tiff image or other format type that does not lose data when compressing and that are compatible with GIS softwares.

I also need it to be in pure python and without external requirements (like calling a subprocess) if possible to avoid compatibility problems when running on a distant machine.¸

Any ideas?

6
  • 2
    What kind of numbers? What are their range? To what accuracy to they need to be reconstructed? Is there any correlation in the numbers? I.e., are nearby numbers in the matrix closer to each other in value than numbers that are further from each other? If the numbers look effectively random, then you would get almost no compression using standard lossless compressors. To get a factor of three, your data needs to be compressible in the first place due to redundancy and correlation, and you would need to be able to exploit that. Commented May 20, 2016 at 18:39
  • My array is a grid produced from a kriging interpolation. The range is variable but could be like from 75 to 2500. Commented May 20, 2016 at 18:52
  • Then why not send/save just the values that you are interpolating between? Leave the interpolation to the receiver. It sounds like that may compress it quite a bit right there. Commented May 20, 2016 at 19:12
  • Because the point of the script is to automatize the interpolation and production of the resulting grid. I'm just trying to find the lightest output possible for the grid. Commented May 20, 2016 at 19:15
  • 1
    Well, given that it is an interpolation, there is a great deal of correlation between samples. You should use a predictor based on neighboring samples, which will leave you with the residuals after prediction to compress. Commented May 20, 2016 at 19:52

2 Answers 2

2

If you need it for GIS software, then use either GDAL or rasterio. Use, for example, the GTiff driver to make a GeoTiff.

Assuming you have floats, here is GDAL:

import numpy as np
from osgeo import gdal
gdal.UseExceptions()
driver = gdal.GetDriverByName('GTiff')
ds = driver.Create('file.tif', 500, 700, 1, gdal.GDT_Float32, ['COMPRESS=LZW'])
ly = ds.GetRasterBand(1)
ly.WriteArray(np.arange(500 * 700).reshape(700, 500))
ly = ds = None  # save, close

Or rasterio:

import rasterio
with rasterio.open('file2.tif', 'w', 'GTiff', width=500, height=700, count=1, dtype='f', COMPRESS='LZW'):
    ds.write(np.arange(500 * 700, dtype='f').reshape(1, 700, 500))

These files are <1 MB. You can get smaller if you use Byte or Int16 types.

(Note: projection or georeferencing were not added)

Sign up to request clarification or add additional context in comments.

Comments

0

Have a look at np.savez_compressed

numpy.savez_compressed(file, *args, **kwds)

Save several arrays into a single file in compressed .npz format.

Example

from tempfile import TemporaryFile
outfile = TemporaryFile()
x = np.arange(10)
y = np.sin(x)
np.savez_compressed(outfile, x, y)
outfile.seek(0) # Only needed here to simulate closing & reopening file
npzfile = np.load(outfile)
npzfile.files

npzfile['arr_0']

1 Comment

Could you explain why? It's pretty much just a container with zlib compression, isn't it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.