<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>George MacKerron: code blog &#187; Ruby</title>
	<atom:link href="http://blog.mackerron.com/category/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mackerron.com</link>
	<description>GIS, software development, and other snippets</description>
	<lastBuildDate>Mon, 09 Aug 2010 08:29:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Using OS Code-Point Polygons in PostGIS</title>
		<link>http://blog.mackerron.com/2009/11/code-point-polygons-postgis/</link>
		<comments>http://blog.mackerron.com/2009/11/code-point-polygons-postgis/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 12:13:43 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[GIS]]></category>
		<category><![CDATA[PostGIS]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.mackerron.com/?p=193</guid>
		<description><![CDATA[Ordnance Survey&#8217;s Code-Point with Polygons &#8220;provides a precise geographical location for each postcode unit in Great Britain&#8221;. It&#8217;s available in various formats, including ESRI .shp files. Many UK academics can access the data via institutional subscription to EDINA Digimap. I&#8217;m using it in my research into subjective wellbeing and environmental quality. This post shows how [...]]]></description>
			<content:encoded><![CDATA[<p>Ordnance Survey&#8217;s <a href="http://www.ordnancesurvey.co.uk/oswebsite/products/codepointpolygons/">Code-Point with Polygons</a> &#8220;provides a precise geographical location for each postcode unit in Great Britain&#8221;. It&#8217;s available in various formats, including <span class="caps">ESRI</span> .shp files. </p>

<p>Many UK academics can access the data via institutional subscription to <a href="http://edina.ac.uk/digimap/"><span class="caps">EDINA</span> Digimap</a>. I&#8217;m using it in <a href="http://personal.lse.ac.uk/MACKERRO/">my research into subjective wellbeing and environmental quality</a>.</p>

<p>This post shows how to:</p>


<ol>
<li><strong>import the data files</strong> into a <a href="http://postgis.refractions.net/">PostGIS</a> database; and</li>
<li><strong>de-normalise the data into a single table</strong>, where there&#8217;s a one-to-one mapping of postcodes to rows, and each row contains either all geographical locations covered by a postcode (as a single geometry column, of type multipolygon) or the reason why no such location is available</li>
</ol>



<p><span id="more-193"></span></p>

<h3>Why de-normalise?</h3>

<p>Step 2 above is required for my purposes because Code-Point data is supplied in a number of separate files per postcode area: </p>


<ul>
<li>a .shp file (and associated .shx and .dbf) mapping postcodes and &#8216;vertical streets&#8217; to the locations they cover;</li>
<li>an accompanying text file mapping postcodes to &#8216;vertical streets&#8217;; and</li>
<li>another text file listing postcodes with no associated locations, either because they represent PO boxes or because the data just isn&#8217;t available.</li>
</ul>



<p>Vertical streets generally represent high-rise buildings, where one location in 2D space may be associated with multiple postcodes. Vertical streets are also a serious pain: not only may one vertical street be associated with many postcodes, but one postcode may be associated with many vertical streets <em>and</em> with non-vertical-street locations too. </p>

<h2>Import the data files</h2>

<p>Download and unzip the Code-Point with Polygons data, in <span class="caps">ESRI</span> .shp format, for the regions you want (where indicative query timings are provided later, these are for whole-UK data &#8212; 120 postcode areas &#8212; on a 2Ghz Intel iMac).</p>

<p>Dump everything in the same directory. For each postcode area XX you should have the following five files:</p>


<ul>
<li><span class="caps">XX.</span>shp, <span class="caps">XX.</span>shx and <span class="caps">XX.</span>dbf</li>
<li>XX_vstreet_lookup.txt</li>
<li>XX_discard.txt</li>
</ul>



<p>If you don&#8217;t already have one you want to use for this purpose, create a PostGIS-enabled PostgreSQL database (in this post, the database is named <code>geo</code>). If you&#8217;re not running PostgreSQL 8.4 or later you may need to fiddle with some fsm_ settings in the Postgres .conf file in order to cope with a data set this large.</p>

<p>Execute the following <span class="caps">SQL </span>to create the tables for discards and vertical streets (e.g. from within pgAdmin):</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_vertical_streets <span style="color: #66cc66;">&#40;</span>
  postcode <span style="color: #1a008a; font-weight: bold;">character</span> <span style="color: #1a008a; font-weight: bold;">varying</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">8</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>
  vstreet <span style="color: #1a008a; font-weight: bold;">character</span> <span style="color: #1a008a; font-weight: bold;">varying</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">8</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_discards <span style="color: #66cc66;">&#40;</span>
  postcode <span style="color: #1a008a; font-weight: bold;">character</span> <span style="color: #1a008a; font-weight: bold;">varying</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">8</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>
  reason <span style="color: #1a008a; font-weight: bold;">character</span> <span style="color: #1a008a; font-weight: bold;">varying</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">7</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>




<p>Next we need to create the table for the .shp-file polygons. We <strong>switch to Ruby</strong> for this &#8212; you can just enter the commands in an <span class="caps">IRB </span>terminal session:</p>


<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">path = <span style="color:#996600;">'/path/to/CodePoint polygons'</span>
<span style="color:#996600;">`/usr/local/pgsql/bin/shp2pgsql &quot;#{path}/ab.shp&quot; cpp_polygons -p -D -s 27700 | /usr/local/pgsql/bin/psql -d geo -U postgres`</span></pre></div></div>




<p>The -p flag to <code>shp2pgsql</code> just creates a table structure, and the -s 27700 gives it the <span class="caps">EPSG </span>code to tell it we&#8217;re using the <span class="caps">OSGB36 </span>datum. You can use any of the .shp files you&#8217;ve downloaded &#8212; it need not be the AB area one.</p>

<p><strong>Now to do the import</strong> into the three tables. Still in Ruby, execute the following loop:</p>


<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC00FF; font-weight:bold;">Dir</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;#{path}/*.shp&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>f<span style="color:#006600; font-weight:bold;">|</span>
  <span style="color:#CC0066; font-weight:bold;">puts</span> f
  <span style="color:#996600;">`/usr/local/pgsql/bin/shp2pgsql &quot;#{f}&quot; cpp_polygons -a -D -s 27700 | /usr/local/pgsql/bin/psql -d geo -U postgres`</span>
  <span style="color:#996600;">`echo &quot;copy cpp_vertical_streets from '#{f.sub(/<span style="color:#000099;">\.</span>shp$/, '_vstreet_lookup.txt')}' csv;&quot; | /usr/local/pgsql/bin/psql geo postgres`</span>
  <span style="color:#996600;">`echo &quot;copy cpp_discards from '#{f.sub(/<span style="color:#000099;">\.</span>shp$/, '_discard.txt')}' csv;&quot; | /usr/local/pgsql/bin/psql geo postgres`</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>




<p>Back in <span class="caps">SQL, </span>add some boolean flags we&#8217;ll use later on, then create an index on the postcode column, which will save a <strong>lot</strong> of time later:</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">alter</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">add</span> <span style="color: #1a008a; font-weight: bold;">column</span> <span style="color: #1a008a; font-weight: bold;">discard</span> <span style="color: #1a008a; font-weight: bold;">boolean</span>;
<span style="color: #1a008a; font-weight: bold;">alter</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">add</span> <span style="color: #1a008a; font-weight: bold;">column</span> vstreet <span style="color: #1a008a; font-weight: bold;">boolean</span>;
<span style="color: #1a008a; font-weight: bold;">alter</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">add</span> <span style="color: #1a008a; font-weight: bold;">column</span> vstreet_and_std <span style="color: #1a008a; font-weight: bold;">boolean</span>;
&nbsp;
<span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">index</span> pc_index <span style="color: #1a008a; font-weight: bold;">on</span> cpp_polygons <span style="color: #66cc66;">&#40;</span>postcode<span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #1a008a; font-weight: bold;">vacuum</span> <span style="color: #1a008a; font-weight: bold;">analyze</span> cpp_polygons;</pre></div></div>




<p>If you like, you can confirm here that there&#8217;s only one row per postcode &#8212; this query <strong>should return nothing</strong>:</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">select</span> <span style="color: #66cc66;">*</span> 
<span style="color: #1a008a; font-weight: bold;">from</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">a</span> 
<span style="color: #1a008a; font-weight: bold;">inner</span> <span style="color: #1a008a; font-weight: bold;">join</span> cpp_polygons b 
<span style="color: #1a008a; font-weight: bold;">on</span> <span style="color: #1a008a; font-weight: bold;">a</span><span style="color: #66cc66;">.</span>postcode <span style="color: #66cc66;">=</span> b<span style="color: #66cc66;">.</span>postcode 
<span style="color: #1a008a; font-weight: bold;">where</span> <span style="color: #1a008a; font-weight: bold;">a</span><span style="color: #66cc66;">.</span>gid <span style="color: #66cc66;">&lt;&gt;</span> b<span style="color: #66cc66;">.</span>gid;</pre></div></div>




<p>So, we now have all the OS data held in three separate tables. In the rest of the post, we&#8217;ll be merging the data in the discards and vertical streets tables into the main table, cpp_polygons.</p>

<h2>Discards</h2>

<p>We&#8217;re going to create a new table, with the same structure as cpp_polygons, to hold the discarded postcode data. These postcodes will have a <span class="caps">NULL </span>geometry column, and a <span class="caps">TRUE </span>discard column (one of the booleans we added earlier). Later, we&#8217;ll insert the contents of this table into cpp_polygons.</p>

<p>Run the following <span class="caps">SQL</span>:</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">sequence</span> cpp_discard_seq <span style="color: #1a008a; font-weight: bold;">start</span> <span style="color: #1a008a; font-weight: bold;">with</span> <span style="color: #cc66cc;">2000000</span>;
&nbsp;
<span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_discard_polys <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #66cc66;">&#40;</span>
<span style="color: #1a008a; font-weight: bold;">select</span> 
  <span style="color: #1a008a;">nextval</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc0000;">'cpp_discard_seq'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> gid<span style="color: #66cc66;">,</span> 
  postcode<span style="color: #66cc66;">,</span>
  <span style="color: #1a008a; font-weight: bold;">cast</span><span style="color: #66cc66;">&#40;</span><span style="color: #1a008a; font-weight: bold;">null</span> <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #1a008a; font-weight: bold;">character</span> <span style="color: #1a008a; font-weight: bold;">varying</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> upp<span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">substring</span><span style="color: #66cc66;">&#40;</span>postcode <span style="color: #1a008a; font-weight: bold;">from</span> <span style="color: #cc0000;">'^[A-Z][A-Z]?'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> pc_area<span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">cast</span><span style="color: #66cc66;">&#40;</span><span style="color: #1a008a; font-weight: bold;">null</span> <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #1a008a; font-weight: bold;">geometry</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> the_geom<span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">true</span> <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #1a008a; font-weight: bold;">discard</span><span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">false</span> <span style="color: #1a008a; font-weight: bold;">as</span> vstreet<span style="color: #66cc66;">,</span>
  <span style="color: #1a008a; font-weight: bold;">false</span> <span style="color: #1a008a; font-weight: bold;">as</span> vstreet_and_std
<span style="color: #1a008a; font-weight: bold;">from</span> cpp_discards
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>




<h2>Vertical streets</h2>

<p>Similarly, we&#8217;re now going to create a new table with the same structure as cpp_polygons to map vertical street postcodes to the right polygons. This requires a left join of the vertical streets data with some of the geometry data in cpp_polygons.</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">set</span> enable_seqscan <span style="color: #66cc66;">=</span> <span style="color: #1a008a; font-weight: bold;">false</span>; 
<span style="color: #808080; font-style: italic;">-- (otherwise pg sometimes fails to use the index)</span>
&nbsp;
<span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">sequence</span> cpp_vstreet_seq <span style="color: #1a008a; font-weight: bold;">start</span> <span style="color: #1a008a; font-weight: bold;">with</span> <span style="color: #cc66cc;">3000000</span>;
&nbsp;
<span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_vstreet_polys <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #66cc66;">&#40;</span>
  <span style="color: #1a008a; font-weight: bold;">select</span> 
    <span style="color: #1a008a;">nextval</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc0000;">'cpp_vstreet_seq'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> gid<span style="color: #66cc66;">,</span> 
    <span style="color: #1a008a; font-weight: bold;">max</span><span style="color: #66cc66;">&#40;</span>v<span style="color: #66cc66;">.</span>postcode<span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> postcode<span style="color: #66cc66;">,</span> 
    <span style="color: #1a008a; font-weight: bold;">cast</span><span style="color: #66cc66;">&#40;</span><span style="color: #1a008a; font-weight: bold;">null</span> <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #1a008a; font-weight: bold;">varchar</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> upp<span style="color: #66cc66;">,</span> 
    <span style="color: #1a008a; font-weight: bold;">max</span><span style="color: #66cc66;">&#40;</span>pc_area<span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> pc_area<span style="color: #66cc66;">,</span> 
    <span style="color: #1a008a;">st_union</span><span style="color: #66cc66;">&#40;</span>the_geom<span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> the_geom<span style="color: #66cc66;">,</span> 
    <span style="color: #1a008a; font-weight: bold;">false</span> <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #1a008a; font-weight: bold;">discard</span><span style="color: #66cc66;">,</span>
    <span style="color: #1a008a; font-weight: bold;">true</span> <span style="color: #1a008a; font-weight: bold;">as</span> vstreet<span style="color: #66cc66;">,</span>
    <span style="color: #1a008a; font-weight: bold;">false</span> <span style="color: #1a008a; font-weight: bold;">as</span> vstreet_and_std
  <span style="color: #1a008a; font-weight: bold;">from</span> cpp_vertical_streets v 
  <span style="color: #1a008a; font-weight: bold;">left</span> <span style="color: #1a008a; font-weight: bold;">join</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">p</span> 
  <span style="color: #1a008a; font-weight: bold;">on</span> v<span style="color: #66cc66;">.</span>vstreet <span style="color: #66cc66;">=</span> <span style="color: #1a008a; font-weight: bold;">p</span><span style="color: #66cc66;">.</span>postcode
  <span style="color: #1a008a; font-weight: bold;">group</span> <span style="color: #1a008a; font-weight: bold;">by</span> v<span style="color: #66cc66;">.</span>postcode
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>




<p>Our last new table is for postcodes that are associated with both standard polygons <strong>and</strong> vertical street polygons.</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">sequence</span> cpp_vstreet_and_std_seq <span style="color: #1a008a; font-weight: bold;">start</span> <span style="color: #1a008a; font-weight: bold;">with</span> <span style="color: #cc66cc;">4000000</span>;
&nbsp;
<span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_vstreet_and_std_polys <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #66cc66;">&#40;</span>
<span style="color: #1a008a; font-weight: bold;">select</span> 
  <span style="color: #1a008a;">nextval</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc0000;">'cpp_vstreet_and_std_seq'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> gid<span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">p</span><span style="color: #66cc66;">.</span>postcode <span style="color: #1a008a; font-weight: bold;">as</span> postcode<span style="color: #66cc66;">,</span>
  <span style="color: #1a008a; font-weight: bold;">cast</span><span style="color: #66cc66;">&#40;</span><span style="color: #1a008a; font-weight: bold;">null</span> <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #1a008a; font-weight: bold;">varchar</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> upp<span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">p</span><span style="color: #66cc66;">.</span>pc_area <span style="color: #1a008a; font-weight: bold;">as</span> pc_area<span style="color: #66cc66;">,</span>
  <span style="color: #1a008a;">st_multi</span><span style="color: #66cc66;">&#40;</span><span style="color: #1a008a;">st_union</span><span style="color: #66cc66;">&#40;</span><span style="color: #1a008a; font-weight: bold;">p</span><span style="color: #66cc66;">.</span>the_geom<span style="color: #66cc66;">,</span> v<span style="color: #66cc66;">.</span>the_geom<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #1a008a; font-weight: bold;">as</span> the_geom<span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">false</span> <span style="color: #1a008a; font-weight: bold;">as</span> <span style="color: #1a008a; font-weight: bold;">discard</span><span style="color: #66cc66;">,</span> 
  <span style="color: #1a008a; font-weight: bold;">false</span> <span style="color: #1a008a; font-weight: bold;">as</span> vstreet<span style="color: #66cc66;">,</span>
  <span style="color: #1a008a; font-weight: bold;">true</span> <span style="color: #1a008a; font-weight: bold;">as</span> vstreet_and_std
<span style="color: #1a008a; font-weight: bold;">from</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">p</span>
<span style="color: #1a008a; font-weight: bold;">inner</span> <span style="color: #1a008a; font-weight: bold;">join</span> cpp_vstreet_polys v
<span style="color: #1a008a; font-weight: bold;">on</span> <span style="color: #1a008a; font-weight: bold;">p</span><span style="color: #66cc66;">.</span>postcode <span style="color: #66cc66;">=</span> v<span style="color: #66cc66;">.</span>postcode
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>




<h2>Cleaning and merging</h2>

<p>Now we need to remove the vertical streets and polygons we just merged into a new table from their respective source tables.</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">delete</span> <span style="color: #1a008a; font-weight: bold;">from</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">where</span> postcode <span style="color: #1a008a; font-weight: bold;">in</span> <span style="color: #66cc66;">&#40;</span><span style="color: #1a008a; font-weight: bold;">select</span> postcode <span style="color: #1a008a; font-weight: bold;">from</span> cpp_vstreet_and_std_polys<span style="color: #66cc66;">&#41;</span>;
<span style="color: #808080; font-style: italic;">-- the above could take around 25 mins</span>
<span style="color: #1a008a; font-weight: bold;">delete</span> <span style="color: #1a008a; font-weight: bold;">from</span> cpp_vstreet_polys <span style="color: #1a008a; font-weight: bold;">where</span> postcode <span style="color: #1a008a; font-weight: bold;">in</span> <span style="color: #66cc66;">&#40;</span><span style="color: #1a008a; font-weight: bold;">select</span> postcode <span style="color: #1a008a; font-weight: bold;">from</span> cpp_vstreet_and_std_polys<span style="color: #66cc66;">&#41;</span>;</pre></div></div>




<p>And, finally, to merge our three new tables into the main table.</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">insert</span> <span style="color: #1a008a; font-weight: bold;">into</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">select</span> <span style="color: #66cc66;">*</span> <span style="color: #1a008a; font-weight: bold;">from</span> cpp_discard_polys;
<span style="color: #1a008a; font-weight: bold;">insert</span> <span style="color: #1a008a; font-weight: bold;">into</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">select</span> <span style="color: #66cc66;">*</span> <span style="color: #1a008a; font-weight: bold;">from</span> cpp_vstreet_polys;
<span style="color: #1a008a; font-weight: bold;">insert</span> <span style="color: #1a008a; font-weight: bold;">into</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">select</span> <span style="color: #66cc66;">*</span> <span style="color: #1a008a; font-weight: bold;">from</span> cpp_vstreet_and_std_polys;</pre></div></div>




<p>In order to join my own data with this data, I find it easiest to format the postcodes on which the join is made with no spaces. So I add and index the following extra column.</p>


<div class="wp_syntax"><div class="code"><pre class="postgresql" style="font-family:monospace;"><span style="color: #1a008a; font-weight: bold;">alter</span> <span style="color: #1a008a; font-weight: bold;">table</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">add</span> <span style="color: #1a008a; font-weight: bold;">column</span> postcode_no_sp <span style="color: #1a008a; font-weight: bold;">character</span> <span style="color: #1a008a; font-weight: bold;">varying</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">8</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #1a008a; font-weight: bold;">update</span> cpp_polygons <span style="color: #1a008a; font-weight: bold;">set</span> postcode_no_sp <span style="color: #66cc66;">=</span> <span style="color: #1a008a; font-weight: bold;">replace</span><span style="color: #66cc66;">&#40;</span>postcode<span style="color: #66cc66;">,</span> <span style="color: #cc0000;">' '</span><span style="color: #66cc66;">,</span> <span style="color: #cc0000;">''</span><span style="color: #66cc66;">&#41;</span>; 
<span style="color: #808080; font-style: italic;">-- the above could take 10 - 15 mins</span>
<span style="color: #1a008a; font-weight: bold;">create</span> <span style="color: #1a008a; font-weight: bold;">index</span> pcns_index <span style="color: #1a008a; font-weight: bold;">on</span> cpp_polygons <span style="color: #66cc66;">&#40;</span>postcode_no_sp<span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #1a008a; font-weight: bold;">vacuum</span> <span style="color: #1a008a; font-weight: bold;">analyze</span> cpp_polygons;</pre></div></div>




<p>You might also want to remove the original vertical street rows, where the postcode column begins with a &#8216;V&#8217;, from the cpp_polygons table &#8212; I didn&#8217;t have any need for this.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.mackerron.com/2009/11/code-point-polygons-postgis/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Signing Amazon Product Advertising API calls in Ruby</title>
		<link>http://blog.mackerron.com/2009/08/sign-aws-api-in-ruby/</link>
		<comments>http://blog.mackerron.com/2009/08/sign-aws-api-in-ruby/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 11:04:23 +0000</pubDate>
		<dc:creator>George</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Web design]]></category>

		<guid isPermaLink="false">http://blog.mackerron.com/?p=182</guid>
		<description><![CDATA[I have a simple site that generates covers for CDs I burn from iTunes purchases and so on (it pre-dates widespread use of JS libraries, and is in much need of prettifying). The site uses Amazon Product Advertising API calls to search and retrieve album cover art and track listings. Since earlier this month, such [...]]]></description>
			<content:encoded><![CDATA[<p>I have <a href="http://mackerron.com/cdcovers/">a simple site</a> that generates covers for CDs I burn from iTunes purchases and so on (it pre-dates widespread use of JS libraries, and is in much need of prettifying). The site uses <a href="https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html">Amazon Product Advertising <span class="caps">API</span></a> calls to search and retrieve album cover art and track listings. Since earlier this month, such <span class="caps">API </span>calls have to be cryptographically signed.</p>

<p>This is somewhat annoying &#8212; the site&#8217;s original design has it communicating independently with Amazon (using Amazon&#8217;s <span class="caps">XSLT API </span>feature to transform their <span class="caps">XML </span>data into <span class="caps">JSON</span>), and that&#8217;s no longer possible with the use of a private key. But it&#8217;s not unfixable. The site now sends its <span class="caps">API </span>call first to my server, which returns a signed version, and then forwards the signed call on to Amazon.</p>

<p>I found most of what I needed for this on <a href="http://chrisroos.co.uk/blog/2009-01-31-implementing-version-2-of-the-amazon-aws-http-request-signature-in-ruby">Chris Roos&#8217; blog</a>, but his version still wasn&#8217;t quite working for me (the two problems I recall are that Ruby&#8217;s <span class="caps">CGI.</span>escape doesn&#8217;t quite follow Amazon&#8217;s requirements, and that times need converting to <span class="caps">GMT</span>).</p>

<p><span id="more-182"></span></p>

<p>Anyway, in case you&#8217;re looking to do the same, here&#8217;s what I ended up with:</p>


<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#008000; font-style:italic;">#!/usr/bin/env ruby</span>
&nbsp;
<span style="color:#008000; font-style:italic;"># Note: You need hmac.rb and hmac-sha2.rb from http://deisui.org/~ueno/ruby/hmac.html </span>
<span style="color:#008000; font-style:italic;"># somewhere in your require paths. ruby-hmac is currently broken under Ruby 1.9.</span>
&nbsp;
<span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>rubygems cgi time hmac<span style="color:#006600; font-weight:bold;">-</span>sha2 base64<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">each</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>lib<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#CC0066; font-weight:bold;">require</span> lib <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
ACCESS_IDENTIFIER = <span style="color:#996600;">'YOUR_PUBLIC_ID'</span>
SECRET_IDENTIFIER = <span style="color:#996600;">'YOUR_PRIVATE_ID'</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">def</span> aws_escape<span style="color:#006600; font-weight:bold;">&#40;</span>s<span style="color:#006600; font-weight:bold;">&#41;</span>
  s.<span style="color:#CC0066; font-weight:bold;">gsub</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">/</span><span style="color:#006600; font-weight:bold;">&#91;</span>^A<span style="color:#006600; font-weight:bold;">-</span>Za<span style="color:#006600; font-weight:bold;">-</span>z0<span style="color:#006600; font-weight:bold;">-</span><span style="color:#006666;">9</span>_.~<span style="color:#006600; font-weight:bold;">-</span><span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">/</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#006600; font-weight:bold;">|</span>c<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#996600;">'%'</span> <span style="color:#006600; font-weight:bold;">+</span> c<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">to_s</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">16</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">upcase</span> <span style="color:#006600; font-weight:bold;">&#125;</span>  
  <span style="color:#008000; font-style:italic;"># for 1.9, you'd replace [0] with .ord -- but ruby-hmac seems broken under 1.9</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
cgi = <span style="color:#CC00FF; font-weight:bold;">CGI</span>.<span style="color:#9900CC;">new</span>
params = cgi.<span style="color:#9900CC;">params</span>.<span style="color:#9900CC;">dup</span>
&nbsp;
amazon_endpoint = params.<span style="color:#9900CC;">delete</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'amazon_endpoint'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
amazon_path = params.<span style="color:#9900CC;">delete</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'amazon_path'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
js_callback = params.<span style="color:#9900CC;">delete</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'js_callback'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
signing_params = <span style="color:#006600; font-weight:bold;">&#123;</span>
  <span style="color:#996600;">'AWSAccessKeyId'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> ACCESS_IDENTIFIER,
  <span style="color:#996600;">'Timestamp'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#CC00FF; font-weight:bold;">Time</span>.<span style="color:#9900CC;">now</span>.<span style="color:#9900CC;">gmtime</span>.<span style="color:#9900CC;">iso8601</span>
<span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
params.<span style="color:#9900CC;">merge</span>!<span style="color:#006600; font-weight:bold;">&#40;</span>signing_params<span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
canonical_querystring = params.<span style="color:#9900CC;">sort</span>.<span style="color:#9900CC;">collect</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>key, value<span style="color:#006600; font-weight:bold;">|</span> 
  <span style="color:#006600; font-weight:bold;">&#91;</span>aws_escape<span style="color:#006600; font-weight:bold;">&#40;</span>key.<span style="color:#9900CC;">to_s</span><span style="color:#006600; font-weight:bold;">&#41;</span>, aws_escape<span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_s</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">join</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'='</span><span style="color:#006600; font-weight:bold;">&#41;</span> 
<span style="color:#9966CC; font-weight:bold;">end</span>.<span style="color:#9900CC;">join</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'&amp;'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
string_to_sign = <span style="color:#996600;">&quot;GET<span style="color:#000099;">\n</span>#{amazon_endpoint}<span style="color:#000099;">\n</span>#{amazon_path}<span style="color:#000099;">\n</span>#{canonical_querystring}&quot;</span>
&nbsp;
hmac = <span style="color:#6666ff; font-weight:bold;">HMAC::SHA256</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>SECRET_IDENTIFIER<span style="color:#006600; font-weight:bold;">&#41;</span>
hmac.<span style="color:#9900CC;">update</span><span style="color:#006600; font-weight:bold;">&#40;</span>string_to_sign<span style="color:#006600; font-weight:bold;">&#41;</span>
signature = <span style="color:#CC00FF; font-weight:bold;">Base64</span>.<span style="color:#9900CC;">encode64</span><span style="color:#006600; font-weight:bold;">&#40;</span>hmac.<span style="color:#9900CC;">digest</span><span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#CC0066; font-weight:bold;">chomp</span>
&nbsp;
params<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'Signature'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = signature
querystring = params.<span style="color:#9900CC;">sort</span>.<span style="color:#9900CC;">collect</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>key, value<span style="color:#006600; font-weight:bold;">|</span> 
  <span style="color:#006600; font-weight:bold;">&#91;</span>aws_escape<span style="color:#006600; font-weight:bold;">&#40;</span>key.<span style="color:#9900CC;">to_s</span><span style="color:#006600; font-weight:bold;">&#41;</span>, aws_escape<span style="color:#006600; font-weight:bold;">&#40;</span>value.<span style="color:#9900CC;">to_s</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">join</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'='</span><span style="color:#006600; font-weight:bold;">&#41;</span> 
<span style="color:#9966CC; font-weight:bold;">end</span>.<span style="color:#9900CC;">join</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'&amp;'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
signed_url = <span style="color:#996600;">&quot;http://#{amazon_endpoint}#{amazon_path}?#{querystring}&quot;</span>
&nbsp;
cgi.<span style="color:#9900CC;">out</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'type'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#996600;">'text/javascript'</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> <span style="color:#996600;">&quot;#{js_callback}('#{signed_url}');&quot;</span> <span style="color:#006600; font-weight:bold;">&#125;</span></pre></div></div>




<p>You can test this locally by feeding key/value parameters to <span class="caps">CGI, </span>followed by Ctrl-D. These, for example:</p>



<pre>amazon_endpoint=ecs.amazonaws.com
amazon_path=/onca/xml
js_callback=do_stuff
Service=AWSECommerceService
Version=2009-03-31
Operation=ItemSearch
SearchIndex=Books
Keywords=george+monbiot</pre>]]></content:encoded>
			<wfw:commentRss>http://blog.mackerron.com/2009/08/sign-aws-api-in-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
