Partitioning large watersheds into sub-basins with RichDEM

Partitioning large watersheds into sub-basins with RichDEM follows a deterministic, raster-based hydrological routing sequence: condition the DEM, compute flow direction (D8 or D∞), generate a flow accumulation surface, apply a drainage threshold to isolate the stream network, and segment contiguous contributing areas. RichDEM’s C++ backend leverages memory-mapped I/O and OpenMP parallelization, enabling reproducible sub-basin extraction on multi-gigabyte DEMs without proprietary GIS overhead. This pipeline integrates directly into broader Watershed Delineation & Catchment Synchronization workflows, providing a scalable foundation for regional hydrologic modeling, flood routing, and water quality assessment.

Environment & Dependency Matrix

RichDEM’s Python bindings wrap compiled C++ routing algorithms and rely on strict GDAL alignment. Environment mismatches are the primary cause of silent raster corruption or segmentation faults.

Component Minimum Version Configuration Notes
Python 3.8–3.11 Python 3.12+ requires RichDEM ≥0.3.5 or a conda-forge rebuild
RichDEM 0.3.4 Install via conda install -c conda-forge richdem for precompiled wheels
GDAL 3.4.0 Python gdal binding version must exactly match the system GDAL library
NumPy 1.21 Required for array interoperability and threshold masking
OS Linux/macOS/Windows Windows requires MSVC runtime; conda environments are strongly recommended
Parallelism OpenMP 4.0+ Set OMP_NUM_THREADS before importing RichDEM to control CPU allocation

Critical memory note: RichDEM loads rasters into RAM by default. For DEMs exceeding 8–12 GB, enable virtual memory tiling via memmap=True during load or pre-tile with gdal_translate -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512.

Production Pipeline Code

The following script implements a complete, automated sub-basin partitioning pipeline. It assumes the input DEM has already undergone hydrological conditioning (e.g., sink filling with rd.FillDepressions or external LiDAR preprocessing).

python
import os
import numpy as np
import rasterio
from rasterio.transform import from_origin
import richdem as rd
import warnings

# Set parallelism BEFORE importing RichDEM
os.environ["OMP_NUM_THREADS"] = "8"
warnings.filterwarnings("ignore", category=FutureWarning)

def partition_subbasins(dem_path, out_dir, accumulation_threshold=1000, method="D8"):
    """
    Partition a large watershed into sub-basins using RichDEM.
    
    Parameters
    ----------
    dem_path : str
        Path to hydrologically conditioned DEM (GeoTIFF)
    out_dir : str
        Output directory for intermediate and final rasters
    accumulation_threshold : int
        Minimum flow accumulation cells to define a stream
    method : str
        Routing algorithm: 'D8' or 'Dinf'
    """
    os.makedirs(out_dir, exist_ok=True)
    
    # 1. Capture spatial reference from the source GeoTIFF, then load the
    # elevation grid into a RichDEM array for fast in-memory processing.
    print(f"[1/5] Loading DEM: {dem_path}")
    with rasterio.open(dem_path) as src:
        ref_transform = src.transform
        ref_crs = src.crs
    dem = rd.LoadGDAL(dem_path, no_data=-9999)
    if dem is None:
        raise RuntimeError("Failed to load DEM. Verify GDAL driver support and file integrity.")
    
    # 2. Compute flow direction
    print(f"[2/5] Computing {method} flow direction...")
    routing_method = "Dinf" if method.upper() == "DINF" else "D8"
    fd = rd.FlowDirection(dem, method=routing_method)
    
    # 3. Generate flow accumulation
    print("[3/5] Computing flow accumulation...")
    acc = rd.FlowAccumulation(fd, method=routing_method)
    
    # 4. Threshold to isolate stream network
    print(f"[4/5] Extracting streams (threshold >= {accumulation_threshold} cells)...")
    streams = (np.asarray(acc) >= accumulation_threshold).astype(np.int32)
    
    # 5. Segment watersheds
    print("[5/5] Labeling sub-basins...")
    subbasins = rd.Watersheds(fd, streams)
    
    # Export results with preserved geospatial metadata
    _export_raster(subbasins, os.path.join(out_dir, "subbasins.tif"), ref_transform, ref_crs)
    _export_raster(np.asarray(acc), os.path.join(out_dir, "flow_accum.tif"), ref_transform, ref_crs)
    _export_raster(streams, os.path.join(out_dir, "stream_network.tif"), ref_transform, ref_crs)
    
    print("Pipeline complete. Outputs saved to:", out_dir)

def _export_raster(data, out_path, transform, crs):
    """Export an array to GeoTIFF using a captured rasterio transform and CRS."""
    arr = np.asarray(data)
    with rasterio.open(
        out_path,
        "w",
        driver="GTiff",
        height=arr.shape[0],
        width=arr.shape[1],
        count=1,
        dtype=arr.dtype,
        crs=crs or "EPSG:4326",
        transform=transform,
        compress="LZW",
        tiled=True,
        blockxsize=512,
        blockysize=512,
        nodata=-9999
    ) as dst:
        dst.write(arr, 1)

Memory & Parallelism Optimization

Large-scale hydrological routing is I/O and memory-bound. To prevent out-of-memory crashes or thread contention:

  1. Enable memory mapping: Pass memmap=True to rd.LoadGDAL(). This forces the OS to page raster blocks on demand rather than loading the entire array into physical RAM.
  2. Pre-tile source DEMs: Use gdal_translate with TILED=YES and BLOCKXSIZE=512 before processing. Tiled GeoTIFFs align with RichDEM’s internal chunking strategy, reducing disk seek overhead. See GDAL GeoTIFF Driver Options for configuration details.
  3. Control OpenMP threads: Set OMP_NUM_THREADS to match physical cores (not logical threads). Hyperthreading rarely improves raster routing performance and can increase cache thrashing.
  4. Avoid intermediate copies: RichDEM returns rdArray objects that share memory with underlying buffers. Do not call .copy() unless explicitly modifying values; otherwise, you will double memory consumption.

Validation & Integration

Automated partitioning requires post-processing validation to ensure hydrological realism:

  • Drainage Density Check: Calculate total stream length per unit area. Values significantly outside regional norms (e.g., 0.5–3.0 km/km² for temperate basins) indicate threshold misalignment or DEM artifacts.
  • Gauge Alignment: Overlay USGS/NHD stream networks or field-mapped catchment boundaries. Sub-basin outlets should align with known monitoring stations or confluence points.
  • Threshold Sensitivity: The accumulation_threshold parameter controls basin granularity. Lower values produce finer, more fragmented sub-basins; higher values merge tributaries into larger units. For methodological guidance on threshold selection and aggregation logic, consult Basin Partitioning Strategies.
  • Algorithm Choice: D8 routing is computationally efficient and suitable for coarse-resolution DEMs (>10 m). D∞ distributes flow across two downslope cells, reducing artificial channelization but increasing runtime by ~15–20%. Refer to the RichDEM Documentation for algorithm-specific parameters and edge-case handling.

When integrating outputs into distributed hydrologic models (SWAT, HEC-HMS, or custom Python frameworks), ensure CRS consistency, verify that -9999 nodata values are respected during raster math, and maintain a version-controlled record of the threshold and routing method used. This guarantees reproducibility and simplifies catchment synchronization across multi-temporal datasets.