Calculate how much a set of boxes is homogenously distributed or contentrated within one dimension, returning the range_quintile of of the overlap counts per cell in a uniform partition of the extent of the dimension.
A uniform distribution of counts will have a small range and will require few cells in a selectivity histogram. A diverse distribution of counts will have a larger range and require more cells in a selectivity histogram (to distinguish between areas of feature density and areas of feature sparseness. This measurement should help us identify cases like X/Y/Z data where there is lots of variability in density in X/Y (diversely in a multi-kilometer range) and far less in Z (in a few-hundred meter range).
764 static int num_bins = 50;
766 int counts[num_bins];
769 #if POSTGIS_DEBUG_LEVEL >= 3 770 double average, sdev, sdev_ratio;
776 for ( d = 0; d < ndims; d++ )
779 memset(counts, 0,
sizeof(
int)*num_bins);
781 smin = extent->
min[d];
782 smax = extent->
max[d];
783 swidth = smax - smin;
793 for ( i = 0; i < num_boxes; i++ )
795 double minoffset, maxoffset;
799 if ( ! ndb )
continue;
802 minoffset = ndb->
min[d] - smin;
803 maxoffset = ndb->
max[d] - smin;
806 if ( minoffset < 0 || minoffset > swidth ||
807 maxoffset < 0 || maxoffset > swidth )
813 bmin = num_bins * (minoffset) / swidth;
814 bmax = num_bins * (maxoffset) / swidth;
816 POSTGIS_DEBUGF(4,
" dimension %d, feature %d: bin %d to bin %d", d, i, bmin, bmax);
819 for ( k = bmin; k <= bmax; k++ )
829 #if POSTGIS_DEBUG_LEVEL >= 3 830 average = avg(counts, num_bins);
831 sdev = stddev(counts, num_bins);
832 sdev_ratio = sdev/average;
834 POSTGIS_DEBUGF(3,
" dimension %d: range = %d", d, range);
835 POSTGIS_DEBUGF(3,
" dimension %d: average = %.6g", d, average);
836 POSTGIS_DEBUGF(3,
" dimension %d: stddev = %.6g", d, sdev);
837 POSTGIS_DEBUGF(3,
" dimension %d: stddev_ratio = %.6g", d, sdev_ratio);
840 distribution[d] = range;
#define MIN_DIMENSION_WIDTH
Minimum width of a dimension that we'll bother trying to compute statistics on.
static int range_quintile(int *vals, int nvals)
The difference between the fourth and first quintile values, the "inter-quintile range".
N-dimensional box type for calculations, to avoid doing explicit axis conversions from GBOX in all ca...