Fri, 03 Sep 2010
I'm currently working on a Fortran program at work: a post-processing tool that takes climate data, in NetCDF format, and outputs in CMOR2 format (a NetCDF variant with climate conventions). So, it links against netcdf and cmor.
Now in HPC and climate in particular, codes are typically linked statically: partially for robustness, but mostly for speed (more on which later). So, I'd like to link this statically, as I have tens of terabytes of data to process. Now, mostly I've been linking using pkg-config:
gfortran -o nemo-rewriter nemo-rewriter.f90 `pkg-config --libs --cflags nemo cmor`
pkg-config assembles the libraries. For dynamic libraries, the netcdf and cmor libraries are themselves linked to dependencies. But in the static case, all dependencies need to be on the link line, which is more complex. Never mind, it should be possible with:
gfortran -static -o nemo-rewriter nemo-rewriter.f90 `pkg-config --static --libs --cflags nemo cmor`
This should work by assembling all the required static libraries, via pkg-config dependencies. Unfortunately not every package has a .pc file, and so this fails: As of version 4.1 NetCDF allows a URL instead of a file to read, and hence depends on curl to retrieve the file. Curl has no pkg-config .pc file describing its libraries, and it fails.
Never mind, lets assemble the static libraries by hand. Debian provides static versions of libraries in the -dev packages. Can I assemble a statically-linked program ? For this I need:
- NetCDF needs libnetcdff.a and libnetcdf.a directly.
- NetCDF needs HDF5: libnethdf5_hl.a and libhdf5.a for version 4 files.
- CMOR2 needs: libcmor2.a
- CMOR2 depends on libudunits2.a, to convert between physical units.
Now here it gets interesting. To handle secure communications and authentication, curl has some complex dependencies. It has two versions. Pick the gnutls one for example:
- libgss.a for Generic Security Services.
- libgss needs libidn.a for Internationalized Domain names
- libgss needs libshishi.a for Kerberos
- libshishi.a needs libgpg-error.a
- libshishi.a needs libgnutls.a
- libshishi.a needs libtasn1.a
- libshishi.a needs libgcrypt.a
- libshishi.a needs libresolv.a
- libcurl.a needs libssl.a
- libcurl.a needs libssh2.a
- libcurl.a needs libldap_r.a
- libdlap_r.a needs liblber.a
- libldap_r.a needs libsasl2.a. Which access databases, so ...
- libsasl2.a needs libmysqlclient.a
- libsasl2.a needs libpq.a
- libsasl2.a needs libdb-4.8.a
- libsasl.a needs libsqlite.a
- libssl.a needs libcrypto.a
- libcrypto.a needs libz.a
- libcurl.a needs libdl.a
- libcurl.a needs libcom_err.a
- libcurl.a needs libkeyutils.a
- libcurl.a needs libgssapi_krb5, which is in dynamic form only
- libcurl.a needs libkrb5crypto, which has no static library
I may have missed some out, having stopped because there is no static implementation of Kerberos on Debian. But still, the idea that a simple little fortran proggie will statically link in four database libraries is silly. It appears to be no longer possible to simply statically link a program in Debian, and definitely not via pkg-config, because so many dependencies do not yet have configuration files.
Posted by Adam Skutt at Fri Sep 3 23:59:00 2010
You can also get a performance gain on x86-32 due to the overhead of PIC code, but that doesn't apply to x86-64.
Posted by Anonymous at Sat Sep 4 06:22:42 2010
True. From the perspective of HPC (where the few percent performance advantage, if real, matters), the point is that this program will not be accessing those functions. This program, for example, is expected to have a several week runtime (small post-processing tool), take in a file, output a file, locally ... is a "partially static" build possible, where the code that actually will be executed is statically built, and the rest of the calls are available dynamically.
Perhaps a tool to do this can do a 'profile run', determining which functions get linked, linking those in statically.
Anonymous:
I'm (when time permits) re-examining the truth (in detail) of static-vs-shared assumptions on modern architectures. For HPC, the cost of processing relocations is irrelevant: the overhead, if any, of PIC isn't: indirect accesses, code bloat leading to cache misses, etc.
Posted by amckinstry at Sat Sep 4 09:47:36 2010
And there's still overhead due to PIC on x86_64 (additional deference through the GOT), which is smaller than the overhead on x86, but still present.
Posted by Adam Skutt at Sat Sep 4 11:49:41 2010
And by the way, getting a static list of library dependencies for a static library does not make much sense, as a well written static library should allow you to do without most of its dependencies if you only use functionality that does not need those dependencies.
(But as that needs proper interface design, so the code can decide at link time if it needed some functionality, and because noone uses static libraries anyway, that will hardly be done).
Posted by Bernhard R. Link at Sat Sep 4 12:56:45 2010
Posted by Bernhard R. Link at Sat Sep 4 13:06:09 2010
I'm trying to build an optimized version of a simple program, avoiding the PIC overhead. The 'normal' way of doing this is to statically link it. I'm just showing that this is no longer possible (with netcdf 4.1).
I agree with what you say about building netcdf without curl support, and have in fact done this. At work i've built two versions of netcdf on our supercomputers: the dynamically-linked full-featured version, and the statically-linked curl-free version. But recommending to scientists that they build their own versions of 5 libraries (netcdf, hdf5, udunits, uuid, cmor ) for their application isn't really a runner. (See here for example). We need to do better than this.
I agree with what you say about dynamic linking under the hood, and am not particularly worried about building properly statically linked codes (only a subset of programs like /sbin/ldconfig really care in that way). I would be happy if I could 'just' build the program so that all the code that mattered was PIC-free.
I'm optimising the program for a particular use-case (e.g. files locally present, curl not used), which is typical in HPC. What i'm working towards is a build-tool (using pkg-config underneath) that profiles an existing (dynamically linked) program, examines which symbols it uses during a run, and rebuilds an optimized version from PIC-free static libraries for those objects, and dynamically-linked libraries for the rest.
Posted by amckinstry at Sat Sep 4 13:50:33 2010
Posted by Anonymous at Sat Sep 4 21:17:53 2010
static linking. If you want to statically link in curl, you'll need
to recompile it (curl) without krb.
See http://bugs.darcs.net/issue806 and http://bugs.debian.org/495163
Posted by Trent W. Buck at Mon Sep 6 12:29:44 2010
Re-twit you post: to my @urciibqo twitter
Posted by IndinueCrence at Sun Nov 6 22:00:34 2011