Why is working with big shapes slow?
Because they are big! All things being equal, it is going to take more time to do calculations against a 100,000 vertex polygon of Canada than against a 5 vertex polygon of Colorado.
What drives that extra time?
- Just getting the object into memory to work with. An object that is larger than 4kb (which is any object with more than about 200 vertices) will end up chopped into smaller pieces by the PostgreSQL TOAST system, and stored in a side table. Getting the object into memory involves retrieving all the parts and then concatenating them together. Usually they have also been compressed on storage, so there’s a decompression step too.
- Having to pull out the whole object in order to test a very local question. You might only care about whether a fishing boat in the Pacific is 100 km from the Canadian coastline, but you’ll be pulling all the Atlantic provinces into memory in order to answer that question.
- Running calculations on a large number of vertices can take a very long time. PostGIS does its best to temporarily index and cache index information about geometries, to keep processing differences down, but bigger objects are still bigger.
- Spatially large objects have large bounding boxes, which means inefficient index scans. Even for objects with relatively few vertices, a bad bounding box can result in a lot of computational churn. The bounding box for France, for example, includes not only continental France, but also the territories of San Pierre and Michelon in the Gulf of Saint Lawrence, on the other side of the Atlantic. As a result a query around Iceland could return France as part of the index scan, and then France would have to be excluded via a more expensive calculation.
What can be done?
- The most effective tool for improving performance on large objects is ST_Subdivide(). It takes in a single large geometry and outputs a set of smaller geometries with a fixed maximum number of vertices per polygon. By chopping up objects, while retaining a reference back to the original table via a primary key, you can effectively “normalize” your geometries into a more homogeneous object size, for faster spatial searching. Users universally report good results from pre-conditioning their geometry in this way.