George MacKerron: code blog

GIS, software development, and other snippets

Simple PostGIS nearest neighbour function

Here’s a less generic and slightly different nearest-neighbour function based on Regina’s generic nearest-neighbour function at Boston GIS.

It follows the same basic idea of using series of enlarging search radii to restrict distance calculations to a manageable subset of things-that-might-be-near. The difference is that it uses a geometric progression of sizes (x, x * y, x * y^2, x * y^3, ...) instead of an arithmetic one (x, x + y, x + 2y, x + 3y, ...).

For some distributions of things-that-might-be-near, and tuned with the right parameters (x, y), this turns out substantially faster (I’ve used it to locate the nearest UK postcode to each mappiness response).

Note that it’s a lot simpler and less generic than the Boston GIS alternative: it returns only one nearest neighbour, without the associated distance, and is also less flexible about what it accepts as input.

create or replace function 
  nn(nearTo                   geometry
   , initialDistance          real
   , distanceMultiplier       real 
   , maxPower                 integer
   , nearThings               text
   , nearThingsIdField        text
   , nearThingsGeometryField  text)
returns integer as $$
  sql     text;
  result  integer;
  sql := ' select ' || quote_ident(nearThingsIdField) 
      || ' from '   || quote_ident(nearThings)
      || ' where st_dwithin($1, ' 
      ||   quote_ident(nearThingsGeometryField) || ', $2 * ($3 ^ $4))'
      || ' order by st_distance($1, ' || quote_ident(nearThingsGeometryField) || ')'
      || ' limit 1';
  for i in 0..maxPower loop
    execute sql into result using nearTo              -- $1
                                , initialDistance     -- $2
                                , distanceMultiplier  -- $3
                                , i;                  -- $4
    if result is not null then return result; end if;
  end loop;
  return null;
$$ language 'plpgsql' stable;

Update, August 2011 — Looking at this again, I realise there was an error in the function originally posted above. Because it did only a simple bounding-box check at each step, it was possible for it to return a geometry that was not in fact the nearest, but could in the worst case be up to 41% (sqrt(2) times) further away than the nearest. The code has been updated to fix this by using st_dwithin. I’ve also tidied it up a bit.


Written by George

March 10th, 2011 at 12:31 pm

Posted in GIS,PostGIS,SQL