Table of Contents
This chapter documents features found in the extras folder of the PostGIS source tarballs and source repository. These are not always packaged with PostGIS binary releases, but are usually plpgsql based or standard shell scripts that can be run as is.
Abstract
A plpgsql based geocoder written to work with the TIGER (Topologically Integrated Geographic Encoding and Referencing system ) / Line and Master Address database export released by the US Census Bureau. In prior versions the TIGER files were
released in ASCII format. The older geocoder used to work with that format is in extras/tiger_geocoder/tiger_2006andbefore
.
There are four components to the geocoder: the data loader functions, the address normalizer, the address geocoder, and the reverse geocoder. The latest version updated to use the TIGER 2010 census data is located in the extras/tiger_geocoder/tiger_2010
folder.
Although it is designed specifically for the US, a lot of the concepts and functions are applicable and can be adapted to work with other country address and road networks.
The script builds a schema called tiger
to house all the tiger related functions, reusable lookup data such as road type prefixes, suffixes, states, various control tables for managing data load, and skeleton base tables from which all the tiger loaded tables inherit from.
Another schema called tiger_data
is also created which houses all the census data for each state that the loader downloads from Census site and loads into the database. In the current model, each set of state tables is
prefixed with the state code e.g ma_addr, ca_edges etc with constraints to enforce only that state data. Each of these tables inherits from the base addr, faces, edges, etc located in the tiger schema.
All the geocode functions only reference the base tables, so there is no requirement that the data schema be called tiger_data
or that data can't be further partitioned into other schemas -- e.g a different schema
for each state, as long as all the tables inherit from the tables in the tiger
schema.
If you are using a prerelease version of PostGIS 2.0.0 tiger geocoder, you can upgrade the scripts using the accompanying upgrade_geocoder.bat / .sh scripts in tiger_2010. We'll be refining the upgrade scripts until release. |
Design:
The goal of this project is to build a fully functional geocoder that can process an arbitrary address string and using normalized TIGER census data, produce a point geometry and rating reflecting the location of the given address and likeliness of the location.
The reverse_geocode
function, introduced in PostGIS 2.0.0 is useful for deriving the street address and cross streets of a GPS location.
The geocoder should be simple for anyone familiar with PostGIS to install and use, and should be easily installable and usable on all platforms supported by PostGIS.
It should be robust enough to function properly despite formatting and spelling errors.
It should be extensible enough to be used with future data updates, or alternate data sources with a minimum of coding changes.
The |
tiger_data
if no schema is specified.tiger_data
if no schema is specified.tiger_data
schema. Each state script is returned as a separate record. Latest version supports Tiger 2010 structural changes and also loads census tract, block groups, and blocks tables.tiger_data
schema. Each state script is returned as a separate record.norm_addy
type that has road suffix, prefix and type standardized, street, streetname etc. broken into separate fields. This function
will work with just the lookup data packaged with the tiger_geocoder (no need for tiger census data).norm_addy
composite type object, returns a pretty print representation of it. Usually used in conjunction with normalize_address.There is another geocoder for PostGIS gaining in popularity and more suitable for international use. It is called Nominatim and uses OpenStreetMap gazeteer formatted data. It requires osm2pgsql for loading the data, PostgreSQL 8.4+ and PostGIS 1.5+ to function. It is packaged as a webservice interface and seems designed to be called as a webservice. Just like the tiger geocoder, it has both a geocoder and a reverse geocoder component. From the documentation, it is unclear if it has a pure SQL interface like the tiger geocoder, or if a good deal of the logic is implemented in the web interface.