1

I have some large geotiff file,size range from 200MB to 10GB. They are produced by drones following a specific route, capturing images at a certain width. Here is an example with mosaic: sample tiff The file size is about 200MB.

Now i want to get the polygon info in this tif file,like the area marked by red line in the picture. So i tried to use PolygonExtractionProcess to do the job,but it takes like forever and my cpu almost runs up to 100%,i have to give up.

Then i find some "middle" tif files which are only white and black,their size is quite small(from 50KB to 10MB),like this: black white tiff

After i use this small tif as input,the polygonize process last about 1-2minutes,i get some result like this(not exactly what i want but pretty close,i only want the black part): polygonize result

The code is like this(just very simple usage):

  AbstractGridCoverage2DReader gridCoverage2DReader = gridFormat.getReader(iis, hints);
                GridCoverage2D coverage = gridCoverage2DReader.read(null);
                PolygonExtractionProcess process = new PolygonExtractionProcess();
                SimpleFeatureCollection features = process.execute(coverage, 0, Boolean.TRUE, null, null, null, null);
                SimpleFeatureIterator sfi = features.features();

After some search, i realize that before polygonize,i need to grayscale or black white binarization the tiff file to reduce the file size.But it seems very hard to do so with JAVA/GeoTools,lots of solutions are using GDAL or GIS software like QGIS/ARCGIS.

My questions are:

  1. Am i on the right track? If not,is there any standard solution?
  2. Can Java/Geotools do the grayscale job of large tif file efficiently and not using to much memory?
  3. How can i only get the black part polygon from the black white tiff?
9
  • GDAL is your best bet here - it’s one line of code. Commented Aug 7, 2024 at 21:34
  • gdal.org/programs/gdal_polygonize.html Commented Aug 7, 2024 at 21:39
  • Is your TIFF tiled and compressed so GeoTools doesn't have to read the whole file into memory to access parts of it? Commented Aug 8, 2024 at 12:37
  • @iskandarblue Thanks for your reply. Right now I have to avoid using other language and workaround than java (lots of reason).But I do realize GDAL is very strong.If there is really no other way,I will dig into this.Thanks anyway! Commented Aug 9, 2024 at 1:24
  • I have a similar issue. When I use GeoTools to vectorize by region, it takes about half an hour, but with GDAL, I can get results in just 10 seconds. Commented Aug 9, 2024 at 1:28

1 Answer 1

0

Usually, before performing raster-to-vector conversion, I will binarize the raster.

use GDAL

    /**
 * TIFF Filter
 *
 * @param sourceTiff Source TIFF file
 * @param targetTiff Target TIFF file
 * @param min        Minimum value
 * @param max        Maximum value
 * @param nBand      Band number
 */
public static void tiffFilter(Path sourceTiff, Path targetTiff, double min, double max, int nBand) {
    if (Files.notExists(sourceTiff)) {
        throw new ServiceException("File does not exist!");
    }
    Dataset inputDataset = gdal.Open(sourceTiff.toAbsolutePath().toString(), gdalconst.GA_ReadOnly);
    if (inputDataset == null) {
        throw new ServiceException("Unable to open raster file: " + gdal.GetLastErrorMsg());
    }
    // Get basic information about the raster data
    int xSize = inputDataset.GetRasterXSize();
    int ySize = inputDataset.GetRasterYSize();
    // Read the raster data
    float[] data = new float[xSize * ySize];
    try {
        Band band = inputDataset.GetRasterBand(nBand);
        band.ReadRaster(0, 0, xSize, ySize, data);
        // Apply filter operations, such as setting pixel values below a threshold to 0
        for (int i = 0; i < data.length; i++) {
            if (data[i] > max || data[i] < min) {
                data[i] = 0;
                continue;
            }
            // Set pixel values greater than the threshold to 1
            data[i] = 1;
        }
    } catch (Exception e) {
        inputDataset.delete();
        throw e;
    }
    Dataset outputDataset = null;
    try {
        // Create a new raster dataset
        Driver driver = gdal.GetDriverByName(inputDataset.GetDriver().getShortName());
        if (driver == null) {
            throw new ServiceException("Unable to get GTiff driver: " + gdal.GetLastErrorMsg());
        }
        outputDataset = driver.Create(targetTiff.toAbsolutePath().toString(), xSize, ySize, 1, gdalconst.GDT_Float64);
        if (outputDataset == null) {
            throw new ServiceException("Unable to create output raster file: " + gdal.GetLastErrorMsg());
        }
        // Write the filtered data
        Band outputBand = outputDataset.GetRasterBand(nBand);
        outputBand.WriteRaster(0, 0, xSize, ySize, data);
        outputBand.SetNoDataValue(0);
        // Set the georeference information and projection
        outputDataset.SetGeoTransform(inputDataset.GetGeoTransform());
        outputDataset.SetProjection(inputDataset.GetProjection());
    } finally {
        // Cleanup
        inputDataset.delete();
        if (Objects.nonNull(outputDataset)) {
            outputDataset.delete();
        }
    }
    if (Files.notExists(targetTiff)) {
        throw new ServiceException("Raster filtering failed!");
    }
}

GeoTools

In this method, I used JDK 8's parallel stream processing. Although dividing by boundaries saves memory, it is still very slow. If there are better solutions, we can discuss and exchange ideas.

/**
 * Converts the grid coverage to a SimpleFeatureCollection.
 * This method divides the given grid coverage into multiple small grids, extracts the geometry for each small grid,
 * and then merges these geometries into a SimpleFeatureCollection.
 *
 * @param gridCoverage2D The grid coverage that provides the geographic range and data to be processed.
 * @param nodata         The value used to fill no-data regions.
 * @param handle         A functional interface used for further processing of the merged geometries.
 * @param row            The number of rows to divide.
 * @param cols           The number of columns to divide.
 * @return Returns a SimpleFeatureCollection containing the processed geometries.
 */
public static SimpleFeatureCollection subGrid2SFC(GridCoverage2D gridCoverage2D, double nodata, boolean insideEdges,
                                                  int row, int cols, Function<Geometry, Geometry> handle) {
    // Divide the grid coverage into smaller regions
    List<Envelope2D> envelope2DS = GeoToolsUtils.divideEnvelope(gridCoverage2D.getEnvelope2D(), row, cols);

    PolygonExtractionProcess process = new PolygonExtractionProcess();
    List<Geometry> rets = new CopyOnWriteArrayList<>();
    AtomicInteger atomicInteger = new AtomicInteger();
    // Parallel processing of each small grid
    envelope2DS.parallelStream()
            .forEach(d -> {
                // Log the processing progress
                if (LOG.isDebugEnabled()) {
                    int i = atomicInteger.incrementAndGet();
                    LOG.debug("Current progress of raster to vector conversion: {} out of {}", i, envelope2DS.size());
                }
                // Convert the boundaries of the small grid into polygons
                Polygon polygon = GeoToolsUtils.createPolygon(d);
                // Extract data from the grid coverage using the polygon and create a SimpleFeatureCollection
                SimpleFeatureCollection ret = process.execute(gridCoverage2D, 0, insideEdges, polygon,
                        Collections.singleton(nodata), null, null);
                if (ret.isEmpty()) {
                    return;
                }
                // Convert the SimpleFeatureCollection into a list of geometries
                List<Geometry> geometries = GeoToolsUtils.fcToGeometryList(ret);
                // Merge the list of geometries into a single geometry
                Geometry union = GeoToolsUtils.union(geometries);
                if (Objects.isNull(union) || union.isEmpty()) {
                    return;
                }
                // Apply additional geometry processing logic
                Geometry apply = handle.apply(union);
                rets.add(apply);
            });
    CoordinateReferenceSystem crs = gridCoverage2D.getCoordinateReferenceSystem();
    return GeoToolsUtils.geometries2FeatureCollection(rets, "subGrid2SFC", crs);
}

GDAL

Using GDAL is faster and more memory-efficient.

#!/bin/bash

# Check the number of parameters
if [ "$#" -ne 3 ]; then
    echo "Usage: $0 <input_tiff> <band> <output_shapefile>"
    exit 1
fi

# Parameter 1: Path to the input TIFF file
INPUT_TIFF=$1

BAND=$2

# Parameter 2: Path to the output shapefile
OUTPUT_SHP=$3

# Execute the gdal_polygonize.py command
gdal_polygonize.py "$INPUT_TIFF" -b "$BAND" -f 'ESRI Shapefile' "$OUTPUT_SHP" 1 fid
    /**
     * 栅格转矢量
     *
     * @param gridPath        网格路径
     * @param targetShapePath 目标形状路径
     * @param band            波段
     */
    public static void grid2Vector(Path gridPath, Path targetShapePath, int band) {
        String gdalCMD = your_shell_path + "/gdal_polygonize.sh %s %s %s";
        String cmd = String.format(gdalCMD, gridPath.toAbsolutePath(), band, targetShapePath.toAbsolutePath());
        LOG.debug("gdal_polygonize cmd:{}", cmd);
        String rests = RuntimeUtil.execForStr("/bin/bash " + cmd);
        LOG.debug("gdal 栅格转矢量输出 {}", rests);
        if (Files.notExists(targetShapePath)) {
            throw new ServiceException("gdal 栅格转矢量失败!");
        }
    }

Q: How can i only get the black part polygon from the black white tiff?

A: The white areas can be set to 0, the black areas to 1, and the remaining areas as NoData values. When exporting to vector, GDAL will include the corresponding attributes.

Sign up to request clarification or add additional context in comments.

12 Comments

Thanks,I think our thoughts are close!But my files are too large,I can't create a float[20000*40000] to manipulate the bit data to get the gray band file,that takes tooooo much memory.And I'm a little confused,if you divide your coverage to grids and do the polygonize separately,won't that "split" the polygon to several sub polygons?BTW are you from China?
You can divide the width and length into several parts to read, read in segments, and write out in segments. Split into multiple sub-polygons, and after completing this, I can merge them. However, I recommend still using GDAL.
you mean use GDAl driver to create a same size tiff and then loop to write the band data to it?
yeah, then use Java code to call a shell script to execute the raster-to-vector conversion on the raster result. This is the fastest solution.
to be honest, it might be because my skill level is not high enough, but I find using GeoTools to process raster files in engineering projects to be cumbersome and not as convenient as GDAL. However, all the GIS databases in our project are based on either GeoTools or GDAL, and sometimes both are used together. There's no need for other tools because these two are already excellent.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.