debug_standardize_address — Returns a json formatted text listing the parse tokens and standardizations
text debug_standardize_address(
text lextab, text gaztab, text rultab, text micro, text macro=NULL)
;
This is a function for debugging address standardizer rules and lex/gaz mappings. It returns a json formatted text that includes the matching rules, mapping of tokens, and best standardized address stdaddr form of an input address utilizing lex table table name, gaz table, and rules table table names and an address.
For single line addresses use just micro
For two line address A micro
consisting of standard first line of postal address e.g. house_num street
, and a macro consisting of standard postal second line of an address e.g city, state postal_code country
.
Elements returned in the json document are
For each word in the input address, returns the position of the word, token categorization of the word, and the standard word it is mapped to. Note that for some input words, you might get back multiple records because some inputs can be categorized as more than one thing.
The set of rules matching the input and the corresponding score for each. The first rule (highest scoring) is what is used for standardization
The standardized address elements stdaddr that would be returned when running standardize_address
Availability: 3.4.0
This method needs address_standardizer extension.
Using address_standardizer_data_us extension
CREATE EXTENSION address_standardizer_data_us; -- only needs to be done once
Variant 1: Single line address and returning the input tokens
SELECT it->>'pos' AS position, it->>'word' AS word, it->>'stdword' AS standardized_word, it->>'token' AS token, it->>'token-code' AS token_code FROM jsonb( debug_standardize_address('us_lex', 'us_gaz', 'us_rules', 'One Devonshire Place, PH 301, Boston, MA 02109') ) AS s, jsonb_array_elements(s->'input_tokens') AS it;
position | word | standardized_word | token | token_code ----------+------------+-------------------+--------+------------ 0 | ONE | 1 | NUMBER | 0 0 | ONE | 1 | WORD | 1 1 | DEVONSHIRE | DEVONSHIRE | WORD | 1 2 | PLACE | PLACE | TYPE | 2 3 | PH | PATH | TYPE | 2 3 | PH | PENTHOUSE | UNITT | 17 4 | 301 | 301 | NUMBER | 0 (7 rows)
Variant 2: Multi line address and returning first rule input mappings and score
SELECT (s->'rules'->0->>'score')::numeric AS score, it->>'pos' AS position, it->>'input-word' AS word, it->>'input-token' AS input_token, it->>'mapped-word' AS standardized_word, it->>'output-token' AS output_token FROM jsonb( debug_standardize_address('us_lex', 'us_gaz', 'us_rules', 'One Devonshire Place, PH 301', 'Boston, MA 02109') ) AS s, jsonb_array_elements(s->'rules'->0->'rule_tokens') AS it;
score | position | word | input_token | standardized_word | output_token ----------+----------+------------+-------------+-------------------+-------------- 0.876250 | 0 | ONE | NUMBER | 1 | HOUSE 0.876250 | 1 | DEVONSHIRE | WORD | DEVONSHIRE | STREET 0.876250 | 2 | PLACE | TYPE | PLACE | SUFTYP 0.876250 | 3 | PH | UNITT | PENTHOUSE | UNITT 0.876250 | 4 | 301 | NUMBER | 301 | UNITT (5 rows)