Linking

Linking #

Source code: bit-bcast/linking

In order to understand the issues of packaging, we’ll need to understand how linking works for C/C++ code. In particular, we’ll need to learn

  • the difference between static and shared libraries,
  • how dynamically loading libraries works,
  • when global variables are initialized,
  • they layout of executables/libraries,
  • and the different types of symbols.

Please remember there’s definitely better places1 to learn about this. I write to remember (and publish because why not). Better references could be:

Let’s deal with shared libraries: LD_LIBRARY_PATH, RPATH, LD_PRELOAD and dlopen/dlsym.

Shared Libraries #

Search Order #

The following directories are searched for filename of the shared library:

  1. Directories listed in RPATH, if RUNPATH isn’t set;
  2. LD_LIBRARY_PATH;
  3. directories listed in RUNPATH;
  4. the cache file /etc/ld.so.cache;
  5. /lib (or /lib64);
  6. /usr/lib (or /usr/lib64).

Note that {R,RUN}PATH was provided as part of the binary that’s trying to find the shared library.

Remember that LD_PRELOAD will interfere; and effectively take precedence over all of the above.

Source:

Preloading #

This is somewhat different in that the shared library will be loaded first. Therefore, by the time a regular library gets to resolve its symbols, they might already be present.

This is commonly used to add instrumentation of common dependencies, such as MPI, BLAS or HDF5, to a program without recompiling it.

Dynamic Loading #

Using dlopen and dlsym one can create a function pointer to a symbol in a particular file. Two common use cases:

  • Plugins, i.e. at runtime certain parts of the program are added conditionally;
  • Fight pip by compiling the C extension of the Python package against every imaginable version of MPI, BLAS or other performance critical libraries. Package all versions in a single Python package. Then at runtime, one dynamically loads the appropriate one, depending on what one found on the system the package ends up running on.

Relative Paths #

When loading the shared object, one must specify a path. If this path is relative, the usual rules for finding a shared library apply. This is interesting in the context of creating wrappers. Say we wanted to create an instrumented version of lmmr_add and then preload that. How does the wrapper avoid reimplementing the body of the original function? One answer could be to simply dynamically load LMMR, then load the symbol lmmr_add and call it via the function pointer. Like so:

  void* liblmmr = dlopen("liblmmr.so", RTLD_NOW);

  auto dyn_lmmr_add = (double (*)(double, double))(dlsym(liblmmr, "lmmr_add"));
  dyn_lmmr_add(a, b);

  dlclose(liblmmr);

We observe that loading behaviour as described above:

$ LD_LIBRARY_PATH=$PWD/v1 ./main
v1: lmmr_add(double, double)

$ LD_LIBRARY_PATH=$PWD/v2 ./main
v2: lmmr_add(double, double)

  1. Milan Stevanovic: Advanced C and C++ Compiling ↩︎