Thu, 18 Jun 2009
As mentioned before, I've started working on Debian Meteorology, adding "standard" meteorology-related packages to Debian. Part of the aim of this is to jump-start an effort of integrating the FLOSS in the field: all the usual libraries that people working in the field use and expect to be on the supercomputers and workstations they use.
So, two packages I've been working on are Magics++ and zyGrib, which are plotting and visualisaton tools. respectively. So they both contain coastline maps of the world. Digging deeper shows they use the same files : a binary database called 'GSHHS', or Global Self-consistent Hierarchical High-resolution Shorelines. Some scope for integration here.
So, I start investigating GSHHS in order to create a 'coastline data' package to be shared. It turns out that building GSHHS depends on GMT, the Generic Mapping Tools, already present in Debian, and this coastline issue has been explored before, and a package gmt-coast-low created.
"gmt-coast-low" is 5.5 MB in size, and as its name suggests, there was once a "gmt-coast-high", but this has since been dropped for taking up too much space in the Debian archive (in its place, a script which will download this data for you has been created. But the files in gmt-coastline-low are in netCDF rather than GSHHS's own binary format; what to do? Posting a mail for help and it turns out that another package is being considered, Basemap, an add-on for Mathplotlib, that also includes the GSHHS data.
I've summarized the files, sizes and versions here in the Debian Wiki. Offhand it appears that there is scope for re-adding a gmt-coastline-high package (with perhaps additional small datafiles on states boundaries, etc. seen in Basemap), though some questions remain:
- Is 170 MB of arch-independent data too much these days in the Debian archive, especially since it appears at least 4 packages can use it ?
- It seems that some packages would need to be patched to bring them up to date with the latest format version for the database. What format should the data be in, this special binary format (quite simple) or netCDF ?