Simple PostGIS nearest neighbour function
Here’s a less generic and slightly different nearest-neighbour function based on Regina’s generic nearest-neighbour function at Boston GIS.
It follows the same basic idea of using series of enlarging search radii to restrict distance calculations to a manageable subset of things-that-might-be-near. The difference is that it uses a geometric progression of sizes (x, x * y, x * y^2, x * y^3, ...
) instead of an arithmetic one (x, x + y, x + 2y, x + 3y, ...
).
For some distributions of things-that-might-be-near, and tuned with the right parameters (x, y
), this turns out substantially faster (I’ve used it to locate the nearest UK postcode to each mappiness response).
Note that it’s a lot simpler and less generic than the Boston GIS alternative: it returns only one nearest neighbour, without the associated distance, and is also less flexible about what it accepts as input.
create or replace function nn(nearTo geometry , initialDistance real , distanceMultiplier real , maxPower integer , nearThings text , nearThingsIdField text , nearThingsGeometryField text) returns integer as $$ declare sql text; result integer; begin sql := ' select ' || quote_ident(nearThingsIdField) || ' from ' || quote_ident(nearThings) || ' where st_dwithin($1, ' || quote_ident(nearThingsGeometryField) || ', $2 * ($3 ^ $4))' || ' order by st_distance($1, ' || quote_ident(nearThingsGeometryField) || ')' || ' limit 1'; for i in 0..maxPower loop execute sql into result using nearTo -- $1 , initialDistance -- $2 , distanceMultiplier -- $3 , i; -- $4 if result is not null then return result; end if; end loop; return null; end $$ language 'plpgsql' stable; |
Update, August 2011 — Looking at this again, I realise there was an error in the function originally posted above. Because it did only a simple bounding-box check at each step, it was possible for it to return a geometry that was not in fact the nearest, but could in the worst case be up to 41% (sqrt(2) times) further away than the nearest. The code has been updated to fix this by using st_dwithin
. I’ve also tidied it up a bit.
Pingback: Infrastructure Coverage based on Open Data | Free and Open Source GIS Ramblings()