There’s a lot of interesting data out there that isn’t points, but regions. In Chapter 10 of the book, we showed you how to calculate an area inside a polygon (see this demo); unfortunately, there just wasn’t the space to cover more region-centric topics.
The current state of the API leaves you pretty much adrift when it comes to actually plotting region-data. Many mashups do a good job faking it with polylines, but polylines and polygons are really very different beasts.
- A polygon is a closed loop, thus you should have a way to colour its interior, and
- Polygons representing things even the slightest bit complex will have vertex counts in the thousands, rather than dozens.
It’s that second reason, I expect, that has made the Maps team antsy about building polygon calls directly in the API… they don’t want to have to explain to Joe Mashup Author why he can’t pass the 179,354 points that define the outline of Alaska in through a URL querystring.
But enough of that; we’re here to show you how you can do regions to your hearts’ content. School districts and election seats are just the start—there’s mountains of data out there that’s presented by country, or by state/province, and it’s just waiting for some map love. (not to mention boundaries based on ecological, social, environmental, or economic divisions.)
The Format Game
Between GML and the TIGER/Line data, you’d think there’d be enough variety already, but the format we’re looking at here is Shapefile. Shapefiles are binary collections of vector data, and as such, it would be extremely hairy to try working through one without some kind of preprocessing.
Fortunately, Bryce Nesbitt and Frank Warmerdam have done the heavy lifting already, having created a set of tools for converting from Shapefiles to rational text-based formats.
Nesbitt has included executables for several platforms in his download there, so you likely don’t even need to compile anything. If you’ve got shell access to your webspace, you can just download a shapefile directly to it, and then process it in-place. Alternatively, it’s just as easy to perform these operations on a local machine. Check it out:
wget http://edcftp.cr.usgs.gov/pub/data/nationalatlas/statesp020.tar.gz tar xvfpz statesp020.tar.gz wget http://www.obviously.com/gis/shp2text/shp2text.zip unzip shp2text.zip ./shp2text --gpx statesp020.shp 3 4 > output.gpx
What did I do? The first line downloads a zipped file of US state outlines from this fantastic page. The second one uses UNIX’s
tar utility to unzip it. The third and fourth line grab and unzip the
shp2text program. And the final line sets to work on our shapefile.
Of the three options offered by
shp2text, I thought GPX looked like the most promising. As an XML format, I know that it would be at least somewhat self-describing; hopefully I’d be able to just open it up and get a feel for what it was all about.
Sure enough, look at how
output.gpx starts out:
<rte><number>0</number><name>Alaska</name><cmt>02</cmt> <rtept lat=" 70.95909119" lon="-157.47343445"></rtept> <rtept lat=" 70.96421051" lon="-157.46252441"></rtept> <rtept lat=" 70.97583771" lon="-157.42974854"></rtept> <rtept lat=" 70.98235321" lon="-157.41198730"></rtept> <rtept lat=" 70.98793793" lon="-157.39967346"></rtept> ...
That can’t be too bad—it’s just a bunch of
rte blocks that wrap around lists of points. Of course, we can’t just point SimpleXML at a 23 MB file (for memory reasons), but by making use of PHP’s SAX processing, we can get at all that data, and get it into an database.
XML to SQL
I set up a new database in MySQL, and created two tables. One would represent the individual vertex points, with the other representing groupings of polygons. Technically, I probably could have gotten away with just the one, but it’s good to have those bounding-box values to query against. All but the simplest states (geometrically, of course) have many polygons in their construction. There are over 3000 records in my
shape_polygons table, yet that’s just 50 states worth of points.
Here are the table definitions:
CREATE TABLE `shape_polygons` ( `id` int(11) NOT NULL auto_increment, `latitude_min` float NOT NULL default '-90', `latitude_max` float NOT NULL default '90', `longitude_min` float NOT NULL default '-180', `longitude_max` float NOT NULL default '180', `code` varchar(32) NOT NULL, `source` varchar(32) default NULL, PRIMARY KEY (`id`), KEY `code` (`code`) ) ENGINE=MyISAM; CREATE TABLE `shape_vertices` ( `id` int(11) NOT NULL auto_increment, `polygon_id` int(11) NOT NULL, `ordering` int(11) NOT NULL, `latitude` float NOT NULL, `longitude` float NOT NULL, `elevation` float default NULL, PRIMARY KEY (`id`), KEY `polygon_id` (`polygon_id`), KEY `ordering` (`ordering`) ) ENGINE=MyISAM;
And here’s the source for my importer. If you need help following it, check out the documentation on SAX XML, since the callback-based approach can be a little less intuitive if you haven’t seen it before.
Anyhow, best of luck with this! Watch for upcoming articles explaining how to turn this information into swanky tilesets and overlays. (and as always, please report any sweet data sources you find, especially those with global scope!)
Importer source: Import.php
Shapefile tools: shp2text
U.S. State Outlines: statesp020.tar.gz