Linking #
Source code: bit-bcast/linking
In order to understand the issues of packaging, we’ll need to understand how linking works for C/C++ code. In particular, we’ll need to learn
- the difference between static and shared libraries,
- how dynamically loading libraries works,
- when global variables are initialized,
- they layout of executables/libraries,
- and the different types of symbols.
Please remember there’s definitely better places1 to learn about this. I write to remember (and publish because why not). Better references could be:
Let’s deal with shared libraries: LD_LIBRARY_PATH
, RPATH
, LD_PRELOAD
and
dlopen
/dlsym
.
Shared Libraries #
Search Order #
The following directories are searched for filename of the shared library:
- Directories listed in
RPATH
, ifRUNPATH
isn’t set; LD_LIBRARY_PATH
;- directories listed in
RUNPATH
; - the cache file
/etc/ld.so.cache
; /lib
(or/lib64
);/usr/lib
(or/usr/lib64
).
Note that {R,RUN}PATH
was provided as part of the binary that’s trying to
find the shared library.
Remember that LD_PRELOAD
will interfere; and effectively take precedence over
all of the above.
Source:
- https://man7.org/linux/man-pages/man8/ld.so.8.html
- https://man7.org/linux/man-pages/man3/dlopen.3.html
Preloading #
This is somewhat different in that the shared library will be loaded first. Therefore, by the time a regular library gets to resolve its symbols, they might already be present.
This is commonly used to add instrumentation of common dependencies, such as MPI, BLAS or HDF5, to a program without recompiling it.
Dynamic Loading #
Using dlopen
and dlsym
one can create a function pointer to a symbol in a
particular file. Two common use cases:
- Plugins, i.e. at runtime certain parts of the program are added conditionally;
- Fight
pip
by compiling the C extension of the Python package against every imaginable version of MPI, BLAS or other performance critical libraries. Package all versions in a single Python package. Then at runtime, one dynamically loads the appropriate one, depending on what one found on the system the package ends up running on.
Relative Paths #
When loading the shared object, one must specify a path. If this path is
relative, the usual rules for finding a shared library apply. This is
interesting in the context of creating wrappers. Say we wanted to create an
instrumented version of lmmr_add
and then preload that. How does the wrapper
avoid reimplementing the body of the original function? One answer could be to
simply dynamically load LMMR, then load the symbol lmmr_add
and call it via the
function pointer. Like so:
void* liblmmr = dlopen("liblmmr.so", RTLD_NOW);
auto dyn_lmmr_add = (double (*)(double, double))(dlsym(liblmmr, "lmmr_add"));
dyn_lmmr_add(a, b);
dlclose(liblmmr);
We observe that loading behaviour as described above:
$ LD_LIBRARY_PATH=$PWD/v1 ./main
v1: lmmr_add(double, double)
$ LD_LIBRARY_PATH=$PWD/v2 ./main
v2: lmmr_add(double, double)
-
Milan Stevanovic: Advanced C and C++ Compiling ↩︎