This is a fork of the PAGC standardizer (original code for this portion was PAGC PostgreSQL Address Standardizer).
The address standardizer is a single line address parser that takes an input address and normalizes it based on a set of rules stored in a table and helper lex and gaz tables.
The code is built into a single postgresql extension library called address_standardizer
which can be installed with CREATE EXTENSION address_standardizer;
. In addition to the address_standardizer extension, a sample data extension called address_standardizer_data_us
extensions is built, which contains gaz, lex, and rules tables for US data. This extensions can be installed via: CREATE EXTENSION address_standardizer_data_us;
The code for this extension can be found in the PostGIS extensions/address_standardizer
and is currently self-contained.
For installation instructions refer to: Section 2.8, “Installing and Using the address standardizer”.
The parser works from right to left looking first at the macro elements for postcode, state/province, city, and then looks micro elements to determine if we are dealing with a house number street or intersection or landmark. It currently does not look for a country code or name, but that could be introduced in the future.
Assumed to be US or CA based on: postcode as US or Canada state/province as US or Canada else US
These are recognized using Perl compatible regular expressions. These regexs are currently in the parseaddress-api.c and are relatively simple to make changes to if needed.
These are recognized using Perl compatible regular expressions. These regexs are currently in the parseaddress-api.c but could get moved into includes in the future for easier maintenance.
standardize_address
function.This section lists the PostgreSQL table formats used by the address_standardizer for normalizing addresses. Note that these tables do not need to be named the same as what is referenced here. You can have different lex, gaz, rules tables for each country for example or for your custom geocoder. The names of these tables get passed into the address standardizer functions.
The packaged extension address_standardizer_data_us
contains data for standardizing US addresses.