Name

Geocode — Takes in an address as a string (or other normalized address) and outputs a set of possible locations which include a point geometry in NAD 83 long lat, a normalized address for each, and the rating. The lower the rating the more likely the match. Results are sorted by lowest rating first. Can optionally pass in maximum results, defaults to 10, and restrict_region (defaults to NULL)

Synopsis

setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);

setof record geocode(norm_addy in_addy, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);

Description

Takes in an address as a string (or already normalized address) and outputs a set of possible locations which include a point geometry in NAD 83 long lat, a normalized_address (addy) for each, and the rating. The lower the rating the more likely the match. Results are sorted by lowest rating first. Uses Tiger data (edges,faces,addr), PostgreSQL fuzzy string matching (soundex,levenshtein) and PostGIS line interpolation functions to interpolate address along the Tiger edges. The higher the rating the less likely the geocode is right. The geocoded point is defaulted to offset 10 meters from center-line off to side (L/R) of street address is located on.

Enhanced: 2.0.0 to support Tiger 2010 structured data and revised some logic to improve speed, accuracy of geocoding, and to offset point from centerline to side of street address is located on. New parameter max_results useful for specifying ot just return the best result.

Examples: Basic

The below examples timings are on a 3.0 GHZ single processor Windows 7 machine with 2GB ram running PostgreSQL 9.1rc1/PostGIS 2.0 loaded with all of MA,MN,CA, RI state Tiger data loaded.

Exact matches are faster to compute (61ms)

SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, 
	(addy).address As stno, (addy).streetname As street, 
	(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip 
	FROM geocode('75 State Street, Boston MA 02109') As g;  
 rating |        lon        |       lat        | stno | street | styp |  city  | st |  zip  
--------+-------------------+------------------+------+--------+------+--------+----+-------
      0 | -71.0556722990239 | 42.3589914927049 |   75 | State  | St   | Boston | MA | 02109

Even if zip is not passed in the geocoder can guess (took about 122-150 ms)

SELECT g.rating, ST_AsText(ST_SnapToGrid(g.geomout,0.00001)) As wktlonlat, 
	(addy).address As stno, (addy).streetname As street, 
	(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip 
	FROM geocode('226 Hanover Street, Boston, MA',1) As g;  
 rating |         wktlonlat         | stno | street  | styp |  city  | st |  zip  
--------+---------------------------+------+---------+------+--------+----+-------
      1 | POINT(-71.05528 42.36316) |  226 | Hanover | St   | Boston | MA | 02113

Can handle misspellings and provides more than one possible solution with ratings and takes longer (500ms).

SELECT g.rating, ST_AsText(ST_SnapToGrid(g.geomout,0.00001)) As wktlonlat, 
    (addy).address As stno, (addy).streetname As street, 
	(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip 
	FROM geocode('31 - 37 Stewart Street, Boston, MA 02116') As g; 
 rating |         wktlonlat         | stno | street | styp |  city  | st |  zip  
--------+---------------------------+------+--------+------+--------+----+-------
     70 | POINT(-71.06459 42.35113) |   31 | Stuart | St   | Boston | MA | 02116
	

Using to do a batch geocode of addresses. Easiest is to set max_results=1. Only process those not yet geocoded (have no rating).

CREATE TABLE addresses_to_geocode(addid serial PRIMARY KEY, address text,
		lon numeric, lat numeric, new_address text, rating integer);

INSERT INTO addresses_to_geocode(address)
VALUES ('529 Main Street, Boston MA, 02129'),
 ('77 Massachusetts Avenue, Cambridge, MA 02139'),
 ('25 Wizard of Oz, Walaford, KS 99912323'),
 ('26 Capen Street, Medford, MA'),
 ('124 Mount Auburn St, Cambridge, Massachusetts 02138'),
 ('950 Main Street, Worcester, MA 01610');
 
-- only update the first 3 addresses (323-704 ms -  there are caching and shared memory effects so first geocode you do is always slower) --
-- for large numbers of addresses you don't want to update all at once
-- since the whole geocode must commit at once 
-- For this example we rejoin with LEFT JOIN 
-- and set to rating to -1 rating if no match 
-- to ensure we don't regeocode a bad address 
UPDATE addresses_to_geocode
  SET  (rating, new_address, lon, lat) 
	= ( COALESCE((g.geo).rating,-1), pprint_addy((g.geo).addy),
	   ST_X((g.geo).geomout)::numeric(8,5), ST_Y((g.geo).geomout)::numeric(8,5) )
FROM (SELECT addid 
	FROM addresses_to_geocode 
	WHERE rating IS NULL ORDER BY addid LIMIT 3) As a
	LEFT JOIN (SELECT addid, (geocode(address,1)) As geo
    FROM addresses_to_geocode As ag
	WHERE ag.rating IS NULL ORDER BY addid LIMIT 3) As g ON a.addid = g.addid
WHERE a.addid = addresses_to_geocode.addid;

result
-----
Query returned successfully: 3 rows affected, 480 ms execution time.

SELECT * FROM addresses_to_geocode WHERE rating is not null;

 addid |                   address                    |    lon    |   lat    |                new_address                | rating 
-------+----------------------------------------------+-----------+----------+-------------------------------------------+--------
     1 | 529 Main Street, Boston MA, 02129            | -71.07181 | 42.38359 | 529 Main St, Boston, MA 02129             |      0
     2 | 77 Massachusetts Avenue, Cambridge, MA 02139 | -71.09428 | 42.35988 | 77 Massachusetts Ave, Cambridge, MA 02139 |      0
     3 | 25 Wizard of Oz, Walaford, KS 99912323       |           |          |                                           |     -1

Examples: Using Geometry filter

SELECT g.rating, ST_AsText(ST_SnapToGrid(g.geomout,0.00001)) As wktlonlat, 
    (addy).address As stno, (addy).streetname As street, 
	(addy).streettypeabbrev As styp, 
	(addy).location As city, (addy).stateabbrev As st,(addy).zip 
  FROM geocode('100 Federal Street, MA',
		3, 
		(SELECT ST_Union(the_geom) 
			FROM place WHERE statefp = '25' AND name = 'Lynn')::geometry
		) As g;

 rating |        wktlonlat         | stno | street  | styp | city | st |  zip
--------+--------------------------+------+---------+------+------+----+-------
      8 | POINT(-70.96796 42.4659) |  100 | Federal | St   | Lynn | MA | 01905
Total query runtime: 245 ms.
	      

See Also

Normalize_Address, Pprint_Addy, ST_AsText, ST_SnapToGrid, ST_X, ST_Y