29. Nearest-Neighbour Searching

29.2. Nearest Neighbor Join

The index assisted order by operator has one major draw back: it only works with a single geometry literal on one side of the operator. This is fine for finding the objects nearest to one query object, but does not help for a spatial join, where the goal is to find the nearest neighbor for each of a full set of candidates.

Fortunately, there’s a SQL language feature that allows us to run a query repeatedly driven in a loop: the LATERAL join.

Here we will find the nearest street to each subway station:

SELECT subways.gid AS subway_gid,
       subways.name AS subway,
       streets.name AS street,
       streets.gid AS street_gid,
       streets.geom::geometry(MultiLinestring, 26918) AS street_geom,
       streets.dist
FROM nyc_subway_stations subways
CROSS JOIN LATERAL (
  SELECT streets.name, streets.geom, streets.gid, streets.geom <-> subways.geom AS dist
  FROM nyc_streets AS streets
  ORDER BY dist
  LIMIT 1
) streets;

Note the way the CROSS JOIN LATERAL acts as the inner part of a loop driven by the subways table. Each record in the subways table gets fed into the lateral subquery, one at a time, so you get a nearest result for each subway record.

_images/knn4.png

The explain shows the loop on the subway stations, and the index-assisted order by inside the loop where we want it:

                           QUERY PLAN
-------------------------------------------------------------------------
 Nested Loop  (cost=0.28..13140.71 rows=491 width=37)
   ->  Seq Scan on nyc_subway_stations subways
       (cost=0.00..15.91 rows=491 width=46)
   ->  Limit
       (cost=0.28..1.71 rows=1 width=170)
         ->  Index Scan using nyc_streets_geom_idx on nyc_streets streets
             (cost=0.28..27410.12 rows=19091 width=170)
                Order By: (geom <-> subways.geom)