<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Misspelled nemesis club &#187; database</title>
	<atom:link href="http://moreati.org.uk/blog/category/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://moreati.org.uk/blog</link>
	<description>A blog about life, technology &#38; databases</description>
	<lastBuildDate>Sat, 10 Jul 2010 16:57:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>The Shapefile 2.0 manifesto</title>
		<link>http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/</link>
		<comments>http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/#comments</comments>
		<pubDate>Sun, 01 Mar 2009 13:13:44 +0000</pubDate>
		<dc:creator>Alex Willmer</dc:creator>
				<category><![CDATA[arcgis]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[rants]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://moreati.org.uk/blog/?p=54</guid>
		<description><![CDATA[Geographic Information Systems (GIS) are by their nature data driven. The data comes in a wide variety of raster and vector formats. Rasters hold raw, continuous data recorded striaght from the real world. An example is Satellite/aerial imagery, this is a commonly held in an open format with broad support, such as GeoTIFF or GeoJPEG.]]></description>
			<content:encoded><![CDATA[<p>Geographic Information Systems (GIS) are by their nature data driven. The data comes in a wide variety of  raster and vector formats. Rasters  hold raw, continuous data recorded striaght from the real world. An example is Satellite/aerial imagery, this is a commonly held in an open format with broad support, such as <a href="http://en.wikipedia.org/wiki/GeoTIFF">GeoTIFF</a> or <a href="http://en.wikipedia.org/wiki/JPEG">GeoJPEG</a>.</p>
<p><a href="http://en.wikipedia.org/wiki/GIS_file_formats#Vector_formats">Vector formats</a> hold refined, discrete data, which has been manually traced or otherwise derived other data sources. Examples include  building outlines, contours, road routes, pipe networks land land parcels and locations. Vector data is usually traced or derived, at great expense from raster data, to encode business information &#8211; as a result it&#8217;s usually highly valuable.</p>
<p>Unfortunately, there are many GIS  vector file formats,  and most are proprietary. They can only be used to their full in their native software. Three of the biggest are AutoCAD <a href="http://en.wikipedia.org/wiki/AutoCAD_DXF">DXF</a>, MapInfo <a href="http://en.wikipedia.org/wiki/MapInfo_TAB_format">TAB</a> and ArcGIS <a href="http://en.wikipedia.org/w/index.php?title=Personal_Geodatabase">Personal Geodatabase</a>. One vector format is unique &#8211;  both an open standard, and in wide use: Shapefile</p>
<p><a href="http://en.wikipedia.org/wiki/Shapefile">Shapefile</a> is publicly documented  in <a href="http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf">ESRI Shapefile Technical Description</a> by <a href="http://www.esri.com">ESRI Inc.</a>, it&#8217;s creator. Any GIS software worth it&#8217;s salt can read and write to the format, so it&#8217;s become the least common denominator. It is <em>the</em> format for storing and exchanging vector data between teams, departments, businesses and government. In my opinion this makes Shapefile the best thing ever to happen to GIS, without it the GIS market would be a fraction of it&#8217;s current size.<span id="more-54"></span></p>
<p>Despite it&#8217;s popularity, Shapefile does have some serious limitations, mainly due to it&#8217;s DBF heritage:</p>
<ul>
<li>A shapefile is limited to <span style="text-decoration: line-through;">2</span> 4 GB or <span style="text-decoration: line-through;">65535</span> 4 billion/len(record) records.<br />
Where len(record) is greater of either the average feature length in bytes, or the length of a DBF record.</li>
<li>Records are limited to <span style="text-decoration: line-through;">1000</span> 65536 bytes or <span style="text-decoration: line-through;">32</span> between 257 &amp; 2038 fields.</li>
<li>Field names are limited to <span style="text-decoration: line-through;">8</span> 10 characters, character fields can hold up to 254 bytes.</li>
<li>Unicode is <span style="text-decoration: line-through;">not supported</span> not widely supported.</li>
</ul>
<p>Currently the only real alternative, for data exchange, is <a href="http://en.wikipedia.org/wiki/Geography_Markup_Language">Geography Markup Language (GML)</a> as defined by the <a href="http://www.opengeospatial.org/">Open Geospatial Consortium (OGC)</a>. An XML dialect, GML has none of the limitations of Shapefile this is why Ordnance Survey use GML to supply <a href="http://www.ordnancesurvey.co.uk/oswebsite/products/osmastermap/">MasterMap</a>, a highly detailed vector map of Great Britain. Support for GML in software is growing, but  it&#8217;s unsuitable as a storage format.</p>
<p>Viewing and editing vector data requires support for random access by attribute and by spatial extent. As an XML dialect GML cannot do this, to find one record, the entire file must be parsed from beginning to end. GML is almost always converted to another format, or loaded into a spatial database before it is used.</p>
<p>A spatial database is a database with data types and functions able to handle geospatial data. For the major databases there is <a href="http://www.oracle.com/technology/products/spatial/index.html">Oracle Spatial</a>, <a href="http://postgis.refractions.net/">PostgreSQL PostGIS</a>, <a href="http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx">SQL Server Spatial</a>, <a href="http://dev.mysql.com/doc/refman/5.1/en/spatial-extensions.html">MySQL Spatial</a> and <a href="http://www-01.ibm.com/software/data/spatial/">DB2 Spatial Extender</a>. All are based on <a href="http://en.wikipedia.org/wiki/Simple_Features">Simple Features for SQL</a> an open standard, meaning spatial data can be queried and updated with SQL like any other data type.</p>
<p>I believe that a portable, standalone spatial database, would make a very good successor to Shapefile.  Such a format would drive the GIS market forward, increasing usage of GIS by making it easier to share edit, publish and share GIS data. A portable spatial database would negate the need for the import, view, edit, export cycle that GML imposes.</p>
<p>At the moment I see 3 contenders for the crown:</p>
<ol>
<li><a href="http://www.esri.com/software/arcgis/geodatabase/about/file-gdbs.html">File Geodatabase</a> is a format from ESRI, it is natively supported by ArcGIS. ESRI proclaim it &#8220;Allow[s] users to easily exchange geodatabases.&#8221; That is true only if both users are running ESRI&#8217;s ArcGIS software. File Geodatabase is a proprietary format, despite promises by ESRI when it was launched.</li>
<li><a href="http://fdo.osgeo.org/fdosdf/index.html">Spatial Data Format</a> (SDF) is a format from Autodesk, it is native support . Support is included as part of their Feature Data Objects library, released as Open Source. SDF is based on the popular SQLite embedded database engine.</li>
<li><a href="http://www.gaia-gis.it/spatialite/">Spatialite</a> is another format based on SQLite, by an Alessandro Furieri. Spatialite is in it&#8217;s infancy still, it&#8217;s first release was 11 months ago.</li>
</ol>
<p>Unfortunately none of these looks like it will become a clear winner any time soon. Each is supported by only one application currently. If ESRI releases the specification for File Geodatabase, I expect it will quickly gain widespread support due to their position as market leader. As open source applications such as <a href="http://lists.osgeo.org/pipermail/qgis-developer/2009-January/005791.html">QGIS  gain Spatialite support</a>, it could slowly achieve dominance in a grass roots fashion. SDF seems to be going nowhere.</p>
<p>So ESRI, please publish the details of File Geodatabase. At it&#8217;s launch, during the 2006 ESRI User Conference, you promised that File Geodatabase would be an interoperable format. You promised to release a software library, so we  could read and write them without ArcGIS. Neither has happened. So File Geodatabase is just another closed format, another pretender to the throne that&#8217;s achieved only 1% of it&#8217;s true potential.</p>
<p>Publish File Geodatabase, or we&#8217;ll take the Shapefile crown by force.</p>
<p>Update 27 Mar 2009: Corrected Shapefile limits, based on <a href="http://www.clicketyclick.dk/databases/xbase/format/dbf.html">Xbase file structure</a> rather than <a href="http://www.clicketyclick.dk/databases/xbase/format/dbase_spec.html">dBASE software specifications</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Notes on using sdelayer to mosaic data into ArcSDE</title>
		<link>http://moreati.org.uk/blog/2008/03/27/notes-on-using-sdelayer-to-mosaic-data-into-arcsde/</link>
		<comments>http://moreati.org.uk/blog/2008/03/27/notes-on-using-sdelayer-to-mosaic-data-into-arcsde/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 23:30:39 +0000</pubDate>
		<dc:creator>Alex Willmer</dc:creator>
				<category><![CDATA[arcgis]]></category>
		<category><![CDATA[database]]></category>

		<guid isPermaLink="false">http://www.moreati.org.uk/blog/2008/03/27/notes-on-using-sdelayer-to-mosaic-data-into-arcsde/</guid>
		<description><![CDATA[For those who aren&#8217;t familiar, ArcSDE is server software that sits atop a database to spatially enable it. The resulting geodatabase is able to store geographic features (e.g. roads, buildings, endangered habitats) along with more common SQL data types. ArcSDE can also store georeferenced rasters such as scanned plans/maps or satellite/aerial imagery. To load raster]]></description>
			<content:encoded><![CDATA[<p>For those who aren&#8217;t familiar, ArcSDE is server software that sits atop a database to spatially enable it. The resulting geodatabase is able to store geographic features (e.g. roads, buildings, endangered habitats) along with more common SQL data types. ArcSDE can also store georeferenced rasters such as scanned plans/maps or satellite/aerial imagery.</p>
<p>To load raster data as a continuous layer one typically mosaics many images, using ArcGIS Desktop or the sderaster command. ArcGIS Desktop is more flexible, it accepts many image formats and can resample images that don&#8217;t perfectly align, but it&#8217;s slow and struggles with large jobs. The sderaster command is faster and scriptable, but it accepts only tiffs and it&#8217;s <em>very fussy</em> about them.<br />
<span id="more-25"></span><br />
What follows are some notes that will remind me the next time and possibly help others.</p>
<ol>
<li>If the real world pixel size (i.e. units m/px or ft/px) on lines 1 &amp; 4 of the world file are not exactly equalt to the value you calculate then correct the world files. If present also correct the geotiff headers. Ordnance Survey 10K tiles may need this correction.</li>
<li>Pyramidding is only visually effective if it can create pixels with averaged colour values. Monochrome or colour mapped images do not satisfy this requirement.</li>
<li>The documentation states that ArcSDE cannot mosaic images with a colour map. This also applies in some cases where no colourmap is present, such as 4 bit greyscale. Use the -N switch to be safe and then reapply the colour map with sderaster -o colormap, once all images are mosaiced.</li>
<li>Geotiff headers are tricky to remove in place, en masse. When preprocessing images be sure that either the world file and/or geotiff tags are correct, or generate your tiffs without getiff tags.</li>
<li>GDAL is much faster than ImageMagick for altering bit depth. The libtiff utilities are faster yet.</li>
<li>Until ArcSDE 8.3 SP, sderaster required that the top left image be loaded into a raster first, or it wouldn&#8217;t mosaic anything.</li>
</ol>
<p>Those wishing to save the effort of loading images into ArcSDE, might wish to look into ArcGIS ImageServer or a WMS server.</p>
<p>To update world files in a batch try the following python script:</p>
<pre>
import os
import sys
import fileinput
import re
import glob

def replace_in_files(filenames, pattern, rterm):
    '''Replace all occurances of regex pattern with rterm in sequence of filenames'''
    regex = re.compile(pattern)
    for line in fileinput.input(filenames, inplace=True):
        sys.stdout.write(re.sub(rterm, line))

if __name__ == '__main__':
    pattern = sys.argv[1]
    rterm = sys.argv[2]
    filenames = glob.glob(sys.argv[3])

    replace_in_files(filenames, pattern, rterm)</pre>
]]></content:encoded>
			<wfw:commentRss>http://moreati.org.uk/blog/2008/03/27/notes-on-using-sdelayer-to-mosaic-data-into-arcsde/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tabular data with Python slides, from PyCon UK</title>
		<link>http://moreati.org.uk/blog/2007/09/30/tabular-data-with-python-slides-from-pyconuk/</link>
		<comments>http://moreati.org.uk/blog/2007/09/30/tabular-data-with-python-slides-from-pyconuk/#comments</comments>
		<pubDate>Sun, 30 Sep 2007 22:42:01 +0000</pubDate>
		<dc:creator>Alex Willmer</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[pyconuk]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://alexw.webfactional.com/blog/?p=14</guid>
		<description><![CDATA[At PyCon UK 2007 I gave a short talk on using Python to deal with tabular data. The slides, demos and modules for my talk are available for download. There is material in the download that I didn&#8217;t present on the day. The talk covers extracting data from various tabular data formats. This is the]]></description>
			<content:encoded><![CDATA[<p>At <a href="http://pyconuk.org/">PyCon UK</a> 2007 I gave a short talk on using Python to deal with tabular data. The <a href="http://moreati.org.uk/talks/Tabular data with Python.zip">slides, demos and modules</a> for my talk are available for download.</p>
<p>There is material in the download that I didn&#8217;t present on the day. The talk covers extracting data from various tabular data formats. This is the first step of an <a href="http://en.wikipedia.org/wiki/Extract%2C_transform%2C_load">Extract, Transform Load (ETL)</a> operation. It also summaries the character of those data formats.</p>
<p>Comments, queries and suggestions are most welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://moreati.org.uk/blog/2007/09/30/tabular-data-with-python-slides-from-pyconuk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
