EuroPython 2009 open for registration

If you’ve enjoyed PyCon UK the last 2 years, the bad news is that it won’t be happening this year. The good news is that  EuroPython 2009 is coming to sunny Birmingham instead, for 3 days from Tues 30th June to Thurs 2nd July and registration is now open. The even better news is that until 14th March it’s really cheap, like 50% off cheap making the 3 day conference only £95.

As a warm up, from Sunday we have 2 days of tutorials at the bargain price of £70. To close there will be sprints from Friday 3rd July, and in case any Django coders get home sick it all coincides with the Birmingham International Jazz Festival.

My talk on ArcGIS and IronPython has been approved. So I’ll see you there.

The Shapefile 2.0 manifesto

Geographic Information Systems (GIS) are by their nature data driven. The data comes in a wide variety of raster and vector formats. Rasters hold raw, continuous data recorded striaght from the real world. An example is Satellite/aerial imagery, this is a commonly held in an open format with broad support, such as GeoTIFF or GeoJPEG.

Vector formats hold refined, discrete data, which has been manually traced or otherwise derived other data sources. Examples include building outlines, contours, road routes, pipe networks land land parcels and locations. Vector data is usually traced or derived, at great expense from raster data, to encode business information – as a result it’s usually highly valuable.

Unfortunately, there are many GIS vector file formats,  and most are proprietary. They can only be used to their full in their native software. Three of the biggest are AutoCAD DXF, MapInfo TAB and ArcGIS Personal Geodatabase. One vector format is unique – both an open standard, and in wide use: Shapefile

Shapefile is publicly documented in ESRI Shapefile Technical Description by ESRI Inc., it’s creator. Any GIS software worth it’s salt can read and write to the format, so it’s become the least common denominator. It is the format for storing and exchanging vector data between teams, departments, businesses and government. In my opinion this makes Shapefile the best thing ever to happen to GIS, without it the GIS market would be a fraction of it’s current size. More >

MXDPERFSTAT

Investigating the performance of an intranet mapping website this week, I was introduced MXDPERFSTAT. It’s a fantastic tool for investigating map display performance. Given an ArcMap document (a .mxd file) it runs ArcMap and loops over the map layers, displaying each at a list of  scales (e.g.  1:100000, 1:10000, 1:2500,  1:500) automatically. Scale cut-offs are followed, in the same way ArcMap would.

Once done, MXDPERFSTAT writes an html report, of the time taken and features retrieved to display each layer at each scale. Since it runs in situ, network delays and other real world bottlenecks are accounted for. It even highlights layers that are abnormally slow, or that fetch an excessive number of features. Perfect for diagnosing a slow map, or guiding a tune up of the infrastructure.

Supporting ArcGIS installations, I’m a regular searcher on ArcScripts. How did I miss this for the last 4 years, and what other gems are there? What GIS utilities do you find indispensable?

Self decrypting emails considered harmful

Sending a sensitive file, one that should be encrypted, amongst Linux and OSS geeks is doable. Most have heard of PGP, many have a GPG key (here is mine) and some even use it.

Sending an encrypted file to most people is a non-starter. The software may be there (Outlook is S/MIME capable), but the knowledge and the experience definately isn’t. Which is a shame, because I’d like to have my bank statement securely sent to my email account.

PGP Desktop has a feature called the Self Decrypting Archive. To quote the PGP Command Line for Servers FAQ:

A Self-Decrypting Archive (SDA) is an executable containing a file that has been encrypted using a passphrase. A recipient of an SDA runs the executable and enters the passphrase to decrypt the file.

SDAs are an attempt to make encrypted email easier, by making decryption far easier for the recipient. However, Self Decrypting Archives are fundementally insecure. Here is how they’re meant to work:

  1. Alice runs PGP Desktop to encrypt a sensitive file, so she can send it to Bob.
  2. Bob doesn’t have any encryption software, so PGP Desktop encrypts the sensitive file and appends it to a small decryptor program. The decryptor + sensitive file is the SDA.
  3. Alice sends the SDA to Bob, attached to an email. Over the phone she tells him the encryption key.
  4. Bob receives the email, and runs the SDA.
  5. The SDA requests the decryption key, and decrypts file for Bob.

That sounds great. Alice can encrypt files, send them securely to Bob, then he can decrypt them. Bob doesn’t need any encryption software installed.

Here’s the problem: Bob is running an unverified program. Supposedly it’s from Alice, but he can’t be sure. This is exactly how email viruses spread. Bob cannot trust the SDA, since he cannot be sure what he received was really sent by Alice.

Could Alice sign the SDA, including the decryptor program? Yes, but it won’t help.

All Bob has to verify Alice’s signature on the SDA, is the decryptor program in that same SDA. Here’s how Mallory, an attacker can subvert this:

  1. Alice sends the encrypted, signed SDA to Bob, and tells Bob the encryption key
  2. Mallory  intercepts the email, replaces the decryptor program with his own. He sends the modified SDA on to Bob, spoofing the from address.
  3. Bob receives the email, and runs Mallory’s SDA.
  4. Mallory’s decryptor, running on Bob’s machine fakes a signature verification.
  5. Mallory’s decryptor requests the encryption key and decrypts the file for Bob. It also sends the decrypted file back to Mallory, and installs a back door on Bob’s computer.

The bottom line, is that you and I must be able to trust our encryption software, or the encryption is pointless. For that we must be able to verify we got it from a trustworthy source. Unsigned email, or email that verifies it’s own signature, cannot be trustworthy.

from ESRI.ArcGIS import Geodatabase

A couple of years ago I tried to use ArcObjects, through IronPython. It didn’t quite work.

Last week I tried again, using the newly released IronPython 2.0. This time it worked better.

create_sde_conn_file.py is based on CreateSDEConnFile.java, from Creating ArcSDE connection files on the fly using Python and ArcObjects on ESRI’s Geoprocessing blog.

For those not already familiar, ArcGIS is by accounts the market leader for Geographic Information System (GIS) software. The core of the suite comprises ArcGIS Desktop, and ArcGIS Server.

On the desktop ArcMap is used to create map documents (.mxd file), whilst ArcCatalog is used to manage data sources. ArcGIS Server can (amongst other things) serve a map document, as a service for web client, Google Earth or remote ArcMap users.

ArcGIS may be automated to an extent, through an interface known as ArcGIS Geoprocessing. But this covers only some cases, delving deeper provides much greater opportunities.

ArcGIS is built on a COM object library named ArcObjects. Native ArcGIS files, such as an ArcSDE connection (.sde file) are the in memory COM object, serialised to disk as binary. It is difficult to edit or create such files in an automated fashion, without calling ArcObjects.

So, like the Java code CreateSDEConnFile.py calls ArcObjects directly. It can produce an ArcSDE connection file, suitable for ArcCatalog. It works by calling the .NET bindings, through Interop assemblies. Anything that can be done through VBA, or C# should be possible through IronPython.

There are a couple of rough edges. ArcObjects is verbose, and IronPython requires some boilerplate to deal with interfaces. Instead of writing

conn_props['SERVER'] = sys.argv[2]

or even

conn_props.SetProperty('SERVER', sys.argv[2])

one needs to write:

esriSystem.IPropertySet.SetProperty(conn_props, 'SERVER', sys.argv[2])

This is explained properly in IronPython bug 1506 and 4538.

To run the script call it as:

"c:Program FilesIronPythonipy.exe" create_sde_conn_file.py
filename.sde hostname 5151 username password SDE.DEFAULT ""

My intention is to take this proof of concept further. To do the same with layer files and map documents. Ultimately to create a build system, able to automatically generate a complete ArcMap document, from textual source files (e.g. json2mxd.py, mxd2xml.py).  This would allow proper version control of the source material, and automatic deployment of ArcGIS Server map services.

In the wider scheme, it should be possible to create custom GIS applications with Python, using the full capabilities ArcObjects and ArcGIS.

Updated 29 Jan 2009: Added some context, for those coming to this post from a Python background. Expanded goals.

meetyourmessenger.co.uk smells phishy

Here’s an email I received today:

From:     meetYourmessenger <no-reply@meetyourmessenger.com>
Subject:     You have (1) new message from Adam

Hi alex,

You have (1) unread invitation “Hello :-) ” from Adam at meetYourmessenger.co.uk

Click here
Show the message in your temporary inbox at meetYourmessenger.co.uk

I know an Adam on MSN Messenger, he didn’t send it. Mcafee SiteAdvisor says all is well,  the comments are less rosy. Until I see evidence otherwise, I’m treating meetyourmessenger as dodgy.

Only one predication, Windows 7 will be released as Windows Vista SP 2.5

I have no evidence, and it’s wishful thinking more than anything. However, I predict that just before the expected release Microsoft will reveal Windows 7 is to be a free upgrade for Windows Vista users.

P.S. If you have any trouble posting a comment to this blog, please let me know on alex@moreati.org.uk.

Timesheets in OpenOffice Calc and Excel

Putting my timesheets in order today, I finally figured out how to make Excel deal correctly with time durations. The default is to treat values as a date/time, formatted as hh:mm. So a value such as 37:00 – meant as a duration – is displayed as 13:00 (1 PM the following day). To correct this, choose custom cell formatting, and enter the format as [h]:mm.

In OpenOffice Calc, [H]:MM is the default format for a time value (tested with 3.0), so durations work out of the box. For something pre-cooked, the OpenOffice Documentation site has a timesheet template by Vivian Lal.

Deep Zoom and others for displaying large images on the web

Slashgeo have noted the release of Deep Zoom in Javascript aka Seadragon, by Microsoft. Deep Zoom allows one to deliver a very high resolution image over the web, with pan and zoom. Only the portions viewed are download, so bandwidth usage is minimised. Until now Deep Zoom was Silverlight only.

It works similarly to OpenStreetMap, Google Maps or Live Search Maps. A large image is transformed into a ‘pyramid’, by generating lower resolution versions (e.g. full, ½, ¼, ⅛ …) and stacking them until a peak is reached. Each level is cut into square tiles, which are stored individually in a known hierarchy. The pyramid generation is similar to mip-mapping.

The image might be satellite or aerial photography, a scanned map, a legal document, medical imagery (e.g. a smear test or x-ray) or any highly detailed photograph. Deep Zoom joins a collection of platforms and technologies that perform a similar role, which I’ll briefly summarise. More >

Firefox rendering/scrolling slow on Linux? Try reseting page zoom

For months now, I’ve found Firefox on my Linux laptop to sometimes be sluggish and a CPU hogging, particularly when scrolling. T-Mobile UK and Engadget were the worst affected. Visiting t-mobile.co.uk saturated the CPU for several seconds whilst rendering. The result looked horrible – grainy, and badly pixelated.

I’d attributed this to X, Nvidia, browser sniffing, Flash and Javascript/CSS. Of course it was me all along. Firefox 3 has a feature called Full Page Zoom, it doesn’t just resize text, it scales everything on the page. I had zoomed these pages, then forgotten.

If any of this sounds familiar, try reseting your zoom level:

  1. Visit the page that scrolls slowly or looks pixelated.
  2. Either press Ctrl + 0, or click Edit → Zoom → Reset.
  3. If text is now too small to read, enable Edit → Zoom → Zoom Text Only, then zoom in with Ctrl + +.

Firefox should now scroll the site smoothly and quickly. The zoom level is remembered on a per site basis, so repeat this for any other pages affected. If you would like to control zoom from the toolbar, try the PageZoom extension. If you would like to set the zoom globally, try No Squint (courtesy of AncientPC on Al-Osaimi Techlog).

The question remains why Full Page Zoom can be so sluggish, and under what circumstances. Also, why does Try Firefox 3 full page zoom on Mozillalinks performs so poorly for me, regardless of page zoom.