database

The Shapefile 2.0 manifesto

Geographic Information Systems (GIS) are by their nature data driven. The data comes in a wide variety of raster and vector formats. Rasters hold raw, continuous data recorded striaght from the real world. An example is Satellite/aerial imagery, this is a commonly held in an open format with broad support, such as GeoTIFF or GeoJPEG.

Vector formats hold refined, discrete data, which has been manually traced or otherwise derived other data sources. Examples include building outlines, contours, road routes, pipe networks land land parcels and locations. Vector data is usually traced or derived, at great expense from raster data, to encode business information – as a result it’s usually highly valuable.

Unfortunately, there are many GIS vector file formats,  and most are proprietary. They can only be used to their full in their native software. Three of the biggest are AutoCAD DXF, MapInfo TAB and ArcGIS Personal Geodatabase. One vector format is unique – both an open standard, and in wide use: Shapefile

Shapefile is publicly documented in ESRI Shapefile Technical Description by ESRI Inc., it’s creator. Any GIS software worth it’s salt can read and write to the format, so it’s become the least common denominator. It is the format for storing and exchanging vector data between teams, departments, businesses and government. In my opinion this makes Shapefile the best thing ever to happen to GIS, without it the GIS market would be a fraction of it’s current size. More >

Notes on using sdelayer to mosaic data into ArcSDE

For those who aren’t familiar, ArcSDE is server software that sits atop a database to spatially enable it. The resulting geodatabase is able to store geographic features (e.g. roads, buildings, endangered habitats) along with more common SQL data types. ArcSDE can also store georeferenced rasters such as scanned plans/maps or satellite/aerial imagery.

To load raster data as a continuous layer one typically mosaics many images, using ArcGIS Desktop or the sderaster command. ArcGIS Desktop is more flexible, it accepts many image formats and can resample images that don’t perfectly align, but it’s slow and struggles with large jobs. The sderaster command is faster and scriptable, but it accepts only tiffs and it’s very fussy about them.
More >

Tabular data with Python slides, from PyCon UK

At PyCon UK 2007 I gave a short talk on using Python to deal with tabular data. The slides, demos and modules for my talk are available for download.

There is material in the download that I didn’t present on the day. The talk covers extracting data from various tabular data formats. This is the first step of an Extract, Transform Load (ETL) operation. It also summaries the character of those data formats.

Comments, queries and suggestions are most welcome.