Name

ST_ClusterDBSCAN — Window function that returns a cluster id for each input geometry using the DBSCAN algorithm.

Synopsis

integer ST_ClusterDBSCAN(geometry winset geom, float8 eps, integer minpoints);

Description

A window function that returns a cluster number for each input geometry, using the 2D Density-based spatial clustering of applications with noise (DBSCAN) algorithm. Unlike ST_ClusterKMeans, it does not require the number of clusters to be specified, but instead uses the desired distance (eps) and density (minpoints) parameters to determine each cluster.

An input geometry is added to a cluster if it is either:

  • A "core" geometry, that is within eps distance of at least minpoints input geometries (including itself); or

  • A "border" geometry, that is within eps distance of a core geometry.

Note that border geometries may be within eps distance of core geometries in more than one cluster. Either assignment would be correct, so the border geometry will be arbitrarily asssigned to one of the available clusters. In this situation it is possible for a correct cluster to be generated with fewer than minpoints geometries. To ensure deterministic assignment of border geometries (so that repeated calls to ST_ClusterDBSCAN will produce identical results) use an ORDER BY clause in the window definition. Ambiguous cluster assignments may differ from other DBSCAN implementations.

[Note]

Geometries that do not meet the criteria to join any cluster are assigned a cluster number of NULL.

Availability: 2.3.0

This method supports Circular Strings and Curves.

Examples

Clustering polygon within 50 meters of each other, and requiring at least 2 polygons per cluster.

Clusters within 50 meters with at least 2 items per cluster. Singletons have NULL for cid

SELECT name, ST_ClusterDBSCAN(geom, eps => 50, minpoints => 2) over () AS cid
FROM boston_polys
WHERE name > '' AND building > ''
	AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-71.04054 42.35141)', 4326), 26986),
           500);

                name                 | bucket
-------------------------------------+--------
 Manulife Tower                      |      0
 Park Lane Seaport I                 |      0
 Park Lane Seaport II                |      0
 Renaissance Boston Waterfront Hotel |      0
 Seaport Boston Hotel                |      0
 Seaport Hotel & World Trade Center  |      0
 Waterside Place                     |      0
 World Trade Center East             |      0
 100 Northern Avenue                 |      1
 100 Pier 4                          |      1
 The Institute of Contemporary Art   |      1
 101 Seaport                         |      2
 District Hall                       |      2
 One Marina Park Drive               |      2
 Twenty Two Liberty                  |      2
 Vertex                              |      2
 Vertex                              |      2
 Watermark Seaport                   |      2
 Blue Hills Bank Pavilion            |   NULL
 World Trade Center West             |   NULL
(20 rows)

A example showing combining parcels with the same cluster number into geometry collections.

SELECT cid, ST_Collect(geom) AS cluster_geom, array_agg(parcel_id) AS ids_in_cluster FROM (
    SELECT parcel_id, ST_ClusterDBSCAN(geom, eps => 0.5, minpoints => 5) over () AS cid, geom
    FROM parcels) sq
GROUP BY cid;