Best practices for filling sinks in high-resolution LiDAR data

Direct answer: The most reliable approach for filling sinks in high-resolution LiDAR DEMs combines a priority-flood algorithm with strict memory management, hydro-enforcement preprocessing, and post-fill flow-accumulation validation. Avoid naive depression carving on sub-meter LiDAR; it creates artificial flat zones that distort overland flow routing. Instead, use a priority-flood implementation that preserves natural drainage gradients, process data in tiled or out-of-core chunks when rasters exceed available RAM, and always validate filled surfaces against known hydrography before deriving flow direction or accumulation grids.

Why Standard Depression Filling Fails at Sub-Meter Resolution

High-resolution LiDAR (0.5–1 m cell size) captures micro-topography that introduces thousands of spurious depressions from vegetation gaps, sensor noise, and minor surface roughness. Traditional carving methods artificially lower terrain to force drainage, which flattens slopes, breaks hydraulic continuity, and produces unrealistic flow paths. Modern DEM Pit Filling Algorithms prioritize gradient preservation by only raising cells to their lowest spill point, maintaining the original terrain morphology while ensuring hydrological connectivity.

Core Best Practices

1. Algorithm Selection: Priority-Flood Over Carving

Priority-flood algorithms process cells in order of elevation, propagating water outward from basin edges. This guarantees that only the minimum number of cells are modified to eliminate sinks. When selecting an implementation, verify that it:

  • Uses a min-heap or similar O(n log n) structure for large datasets
  • Supports floating-point precision to avoid artificial terracing
  • Handles edge conditions without introducing artificial spill points

2. Hydro-Enforcement Preprocessing

Never fill raw LiDAR without first burning in known drainage infrastructure. Unenforced LiDAR will treat culverts, bridges, and engineered channels as barriers, producing hydrologically disconnected watersheds. Follow established Hydrology Data Preparation & DEM Processing workflows to:

  • Vector-burn stream networks using depth-based incision (typically 1–3 m)
  • Drop culvert/bridge elevations to match invert levels
  • Mask known water bodies to prevent artificial filling of lakes or reservoirs

3. Memory & I/O Strategy

High-res LiDAR tiles often exceed 10–50 GB uncompressed. Loading entire rasters into memory causes swap thrashing or kernel OOM kills. Implement windowed I/O with overlap buffers:

  • Read/write in chunks using rasterio windowed operations (official windowed I/O docs)
  • Apply a ≥ 500 m overlap buffer per tile to prevent flow-routing edge artifacts
  • Validate system RAM ≥ 2× chunk size before processing
  • Use memory-mapped arrays (np.memmap) for intermediate grids when disk I/O is faster than RAM allocation

4. Validation Protocol

After filling, compute flow direction and accumulation. Reject outputs that exhibit:

  • Unexplained flat zones > 3×3 cells outside known water bodies
  • Accumulation spikes or drop-offs at tile boundaries
  • Disconnected flow paths crossing mapped ridgelines
  • Drainage density deviations > 15% from pre-fill hydrography

Cross-reference filled DEMs with agency modeling standards and USGS 3DEP processing guidelines (USGS 3DEP Topographic Data Quality Levels) to ensure hydrological realism before downstream analysis.

Production-Ready Python Implementation

The following snippet uses richdem for priority-flood execution and rasterio for windowed I/O. It includes explicit memory validation, metadata preservation, and overlap-aware tiling.

python
import os
import sys
import rasterio
import richdem as rd
import numpy as np
from rasterio.windows import Window

def fill_lidar_dem(input_tif: str, output_tif: str, chunk_size_mb: int = 2048, overlap_m: int = 500) -> None:
    """
    Fill sinks in high-resolution LiDAR DEM using priority-flood with windowed I/O.
    Handles large files via explicit memory validation and preserves CRS/metadata.
    """
    if not os.path.exists(input_tif):
        raise FileNotFoundError(f"Input DEM not found: {input_tif}")

    with rasterio.open(input_tif) as src:
        if src.count != 1:
            raise ValueError("Multi-band DEMs are not supported. Provide a single-band raster.")
        
        # Calculate window dimensions based on chunk size
        pixel_area = src.res[0] * src.res[1]
        pixels_per_chunk = int((chunk_size_mb * 1024**2) / (src.dtypes[0].itemsize * 1.5))
        chunk_width = int(np.sqrt(pixels_per_chunk))
        chunk_height = chunk_width
        overlap_px = int(overlap_m / src.res[0])

        # Memory validation
        total_pixels = src.width * src.height
        estimated_ram_gb = (total_pixels * src.dtypes[0].itemsize) / (1024**3)
        if estimated_ram_gb > 0.8 * (os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')) / (1024**3):
            print(f"Warning: Full raster requires ~{estimated_ram_gb:.1f} GB RAM. Using windowed processing.")

        profile = src.profile.copy()
        profile.update(dtype='float32', compress='lzw', nodata=np.nan)

        with rasterio.open(output_tif, 'w', **profile) as dst:
            for row in range(0, src.height, chunk_height):
                for col in range(0, src.width, chunk_width):
                    # Define window with overlap
                    win_row = max(0, row - overlap_px)
                    win_col = max(0, col - overlap_px)
                    win_height = min(chunk_height + 2 * overlap_px, src.height - win_row)
                    win_width = min(chunk_width + 2 * overlap_px, src.width - win_col)
                    
                    window = Window(win_col, win_row, win_width, win_height)
                    data = src.read(1, window=window)
                    
                    # Handle nodata
                    mask = data == src.nodata
                    data[mask] = np.nan
                    
                    # Convert to richdem array and fill
                    rd_arr = rd.rdarray(data, no_data=np.nan)
                    filled = rd.FillDepressions(rd_arr, in_place=False)
                    
                    # Extract core (remove overlap)
                    core_row = overlap_px if row > 0 else 0
                    core_col = overlap_px if col > 0 else 0
                    core_height = min(chunk_height, data.shape[0] - core_row)
                    core_width = min(chunk_width, data.shape[1] - core_col)
                    
                    core_data = filled[core_row:core_row+core_height, core_col:core_col+core_width]
                    core_data[np.isnan(core_data)] = src.nodata
                    
                    # Write to output
                    out_window = Window(col, row, core_width, core_height)
                    dst.write(core_data, 1, window=out_window)

    print(f"Successfully filled DEM written to: {output_tif}")

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python fill_lidar.py input.tif output.tif")
        sys.exit(1)
    fill_lidar_dem(sys.argv[1], sys.argv[2])

Implementation Notes & Scaling

  • Overlap Handling: The 500 m buffer ensures flow direction algorithms have sufficient context at tile edges. Without it, artificial sinks form where windows abut.
  • Data Types: LiDAR DEMs should be processed as float32 or float64. Casting to int16 truncates sub-centimeter gradients and introduces artificial terraces.
  • Parallelization: For multi-core systems, wrap the window loop in concurrent.futures.ProcessPoolExecutor. Ensure each worker reads/writes to separate temporary tiles, then merge using rasterio.merge.
  • Validation Automation: Automate post-fill checks by computing D8 flow direction and comparing accumulation percentiles against reference watersheds. Flag tiles where > 2% of cells fall into unexplained flat zones.

When applied consistently, these practices produce hydrologically sound DEMs suitable for flood modeling, watershed delineation, and terrain analysis at sub-meter resolution.