"You take a million, billion tonnes of flaming inferno and turn it into 'twinkle, twinkle little star' ..."

Sun, 21 Jun 2009

Debian Meteorology : Status, Summer Solstice 2009.

Unfortunately I won't be able to make it to DebConf9, so as an aid to those who are going, here's a summary of current work:

I've just uploaded terralib 3.3.1 to Debian, and its sitting in the NEW queue, as the older version was removed from the archive due to lack of maintenance (it had an RC bug in both including its own copy of libtiff and failing to link against it - now it links against an external copy of libgeotiff and libtiff).

In the NEW queue it joins g2clib, hdf-eos4, hdf-eos5 and udunits. These are there as dependencies of other Meteorology-related packages I'm working on: magics++ needs terralib and gshhs; zygrib needs gshhs (it has a copy built-in). NCL (NCAR Command Language) has a rake of dependencies including udunits, hdf-eos, g2clib and vis5d+ (ITP'd) . I'm also packaging VISIT for visualization.

Then there is the GSHHS issue: I think I'll end up packaging 'gmt-coastline-high', but the format of the coastline maps needs to be decided (netCDF or its own binary format) and updating the sundry packages to read the latest version needs tackling.

I'm packaging these as they are used at ICHEC and i've experience building them. One of the main aims I had in setting up Debian Meteorology (beyond adding the software to Debian) was to help integrate all the Free and Open Source code in the Earth-sciences field, and sort out dependency and build issues. I hadn't expected to encounter quite so many so quickly, though. I don't expect to get more done before vacation-time, but I'll be happy if I get these done this summer.

Thu, 18 Jun 2009

Maps and Coastlines in Debian

As mentioned before, I've started working on Debian Meteorology, adding "standard" meteorology-related packages to Debian. Part of the aim of this is to jump-start an effort of integrating the FLOSS in the field: all the usual libraries that people working in the field use and expect to be on the supercomputers and workstations they use.

So, two packages I've been working on are Magics++ and zyGrib, which are plotting and visualisaton tools. respectively. So they both contain coastline maps of the world. Digging deeper shows they use the same files : a binary database called 'GSHHS', or Global Self-consistent Hierarchical High-resolution Shorelines. Some scope for integration here.

So, I start investigating GSHHS in order to create a 'coastline data' package to be shared. It turns out that building GSHHS depends on GMT, the Generic Mapping Tools, already present in Debian, and this coastline issue has been explored before, and a package gmt-coast-low created.

"gmt-coast-low" is 5.5 MB in size, and as its name suggests, there was once a "gmt-coast-high", but this has since been dropped for taking up too much space in the Debian archive (in its place, a script which will download this data for you has been created. But the files in gmt-coastline-low are in netCDF rather than GSHHS's own binary format; what to do? Posting a mail for help and it turns out that another package is being considered, Basemap, an add-on for Mathplotlib, that also includes the GSHHS data.

I've summarized the files, sizes and versions here in the Debian Wiki. Offhand it appears that there is scope for re-adding a gmt-coastline-high package (with perhaps additional small datafiles on states boundaries, etc. seen in Basemap), though some questions remain:

  • Is 170 MB of arch-independent data too much these days in the Debian archive, especially since it appears at least 4 packages can use it ?
  • It seems that some packages would need to be patched to bring them up to date with the latest format version for the database. What format should the data be in, this special binary format (quite simple) or netCDF ?

Tue, 07 Apr 2009

Debian Meteorology

I've added a Debian Meteorology section to the DebianScience page on Debian Wiki. The aim is to add the free and open-source meteorology packages I currently maintain and work on at ICHEC to Debian.

So far I've packaged CDO (Climate Data Operators), and am working on EMOSLIB, an interpolation library. Enrico Zini is packaging GRIB API. Other interesting packages include the OASIS coupler and the VISIT visualization software, and adding support for meteorology data formats to /etc/magic, with desktop icons and mime filetypes, etc.

Tasks to investigate include (1) What other software are people interested in, and (2) getting added to Debian pure blends in Alioth.

Wed, 26 Sep 2007

GIT Changelog comments

My job involves working on other peoples scientific codes, and recently I've been experimenting with using GIT as a VCS. Its very handy for making a quick repository while you change one or two files, nice and lightweight.

What distinguishes science codes from other software projects, both open-source and proprietary, is the comparative lack of formal releases. Often the developers are the users, making changes for their own use. They make a model, write a paper, done. They'll take their old code and edit it to add features, but don't plan on "releasing" it in any formal way. So, often you get a tarball from a fellow researcher with the name 'astro.tar.gz', and beyond the paper describing the science, you'll be lucky to find a changelog. Online repositories, test suites, etc. are a rarity.

So I've been wondering how to improve on this, in terms of refactoring, etc. and how GIT and other Distributed VCS's can best be used in science. Some ideas:

  • Keep a separate cleanup branch. When working on a code to add new functionality, keep a branch for simple bugfixes to the original code. Git makes it easy to merge branches, so develop features on a per-feature branch, and keep fixes to a dedicated 'cleanup' branch that the original developer could pull from.
  • Tag or label the changes in the Git Changelog. Current labels (the word tag is overloaded) I've been using are:
    • BUGFIX: for fixes.
    • RF: for refactorings
    • DOC: for when adding comments to code I didn't understand.
    • NEW: new functionality
    This makes it easy for someone to see if they want to pull these patches into their version of the code.

Has anyone else seen similar techniques or tags in use, particularly for science ?

Thu, 22 Mar 2007

Dear LazyWeb: Openid in PyBlosxom

OK, Having agreed that Openid is a good idea to solve the multiple-login-issue, i've been looking at enabling it on the python-based projects I work with. Harder than it looks, for no apparent reason.

Firstly, I've succeeded in enabling an Openid Server in my Pyblosxom blog. Just google to pyblosxom+openid+server and install. It works. Now I can log into openid sites with my blog address http://blog.sceal.ie.

Alternatively, I can openid-enable the comments login on the blog by installing the openid comments plugin. Unfortunately this seems to conflict with the server above, and both seem to conflict with Debians python-openid. All three seem to have a common original codebase that has diverged, but I haven't had time to do any software genealogy and figure out which codebase is out of date. Any ideas anybody ? (other than emailing everyone involved).

Secondly, I'd like to share the logins with my MoinMoin wiki. Again, a patch has been written for this, but to share the logins between pyblosxom and MoinMoin will need co-ordination. My current plan is to try and get all related openid sites to share /srv/www/openid-store on the server to share identities. This'll need patches, and probably justify making Debian packages out of the plugins.

Finally, I'm working on a Plone website for IFAS, for observation tracking and co-ordination (see aop.irishastronomy.org and aop-test.irishastronomy.org if you're interested). Here, all the users already have logins on a related system in PHPBB on a different host. It would be nice to integrate them : query the SQL database on the PHPBB site? give them openid identities from the PHPBB machine?

Mon, 19 Feb 2007

Lustre now in Debian

Kudos to whoever is now working on the NEW queue - Lustre 1.5.95 is now in Debian. Please report any issues you see with this package, but don't trust it with production data just yet - 1.5.95 is also known as 1.6beta5, a BETA release.

I am presently working on backporting kernel fixes for kernel 2.6.18 from version 1.4.9, which has recently been released.

Fri, 16 Feb 2007

Tied up in Tentacles

Last year I was working for a small Satellite ISP : we installed broadband over satellite to small communities. Our satellite provider was Italian, so the IP addresses we used were mis-identified as being Italian (not Irish). This turned out to be a problem when google was appearing in Italian, and directing you to Italian sites; I fixed this with a hack redirecting google.com to google.ie using squidGuard in front of a transparent Squid proxy.

One year later, Ildana has gone bust, but the servers and other subnets are still in the field, some with new non-Satellite uplinks. I got a call from an old customer who was having a wierd problem: they could reach all the internet except Google. Surely Google had not gone down?

A quick look showed that google was now redirecting google.ie to google.com. Curse and recurse ad nauseum. Easily fixed, though.

So whos to blame? Google and many,many other websites, for ignoring language preferences stated in the HTTP headers (especially those websites who pull in adverts based on IP address, while failing to pass on language preferences: we saw English-speaking newspaper pages with Italian adverts). The whole broken nature of GeoIP ? especially with Mobile IP, nobody should be making assumptions based on IP address to location. Try looking at geographical DNS extensions, etc. instead. And how should squid and squidGuard cope with mutual loops?

Wed, 07 Feb 2007

Lustre on Debian

Just to keep anyone who's following up-to-date, Lustre 1.5.95 (alias 1.6beta5) is currently in the Debian New Queue. Its been there two weeks; looks like NEW processing has slowed down again. It was rejected the last two times due to inadequate copyright / licensing documentation; this and other issues have been fixed.

1.5.95 seems stable enough; I've had corruption issues on one filesystem (the main one on my test cluster, unfortunately), so try, but don't rely on it.

Meanwhile 1.5.97 has come out, and is now in the repository. This builds and runs, but seems to have many problems on 2.6.18 kernels. These are known issues, apparently; there are bugfixes for some of them in the debian/patches directory that could be ported forward, but fortunately the nice folks at clusterfs plan on doing a release supporting 2.6.18 in the next week or so, so I'll hold off until then.

Wed, 18 Oct 2006

Testing, Testing

Playing with the blog ... Some CSS hacking due.

Firstly, down to real bugs. See DebianBug392987 to see the priority in Debian land, or LustreScratchPage to see where I'm at with Lustre.

Wed, 25 Feb 2004

iso-codes

Ok, so I've uploaded iso-codes to Alioth; Try here to get some details. I'll post up a tarball of the latest version and do an upload tonight, now that 0.24 is in Debian testing.