Wed, 26 Sep 2007
My job involves working on other peoples scientific codes, and recently I've been experimenting with using GIT as a VCS. Its very handy for making a quick repository while you change one or two files, nice and lightweight.
What distinguishes science codes from other software projects, both open-source and proprietary, is the comparative lack of formal releases. Often the developers are the users, making changes for their own use. They make a model, write a paper, done. They'll take their old code and edit it to add features, but don't plan on "releasing" it in any formal way. So, often you get a tarball from a fellow researcher with the name 'astro.tar.gz', and beyond the paper describing the science, you'll be lucky to find a changelog. Online repositories, test suites, etc. are a rarity.
So I've been wondering how to improve on this, in terms of refactoring, etc. and how GIT and other Distributed VCS's can best be used in science. Some ideas:
- Keep a separate cleanup branch. When working on a code to add new functionality, keep a branch for simple bugfixes to the original code. Git makes it easy to merge branches, so develop features on a per-feature branch, and keep fixes to a dedicated 'cleanup' branch that the original developer could pull from.
- Tag or label the changes in the Git Changelog. Current labels (the word tag is overloaded) I've been using are:
- BUGFIX: for fixes.
- RF: for refactorings
- DOC: for when adding comments to code I didn't understand.
- NEW: new functionality
Has anyone else seen similar techniques or tags in use, particularly for science ?