LIKWID

LIKWID toolkit #

Performance Counters #

Instrument regions you want to measure as follows:

void loop() {
  #pragma omp parallel
  LIKWID_MARKER_START("loop");

  #pragma omp parallel for
  for(...) {
  }

  #pragma omp parallel
  LIKWID_MARKER_STOP("loop");
}

What this does, it instructs LIKWID to read the perf counters on each core. Naturally, you can play golf with the omp parallel region. Moreover threads don’t need to be OMP threads.

Initialization:

int main() {
  LIKWID_MARKER_INIT;

  #pragma omp parallel
  LIKWID_MARKER_REGISTER("loop");

  loop();

  LIKWID_MARKER_CLOSE;
}

This does two things for us:

  1. It pins threads to cores. This happens during the first parallel region.
  2. It prepares the datastructures required to accumulate for region "loop".

In micro-benchmarks it can be essential to perform both; and measurements can be surprisingly off if one doesn’t.

Call with -C and -m, e.g.

likwid-perfctr -g GROUP -C S0:0-4 -m ./a.out

Be mindful of NUMA effects when pinning to consecutive cores.

Laptops and Desktops #

From my limited understanding support for server hardware is prioritized and since adding more hardware requires (a lot) of manual labor and access to such a CPU, support for laptops might not be complete.

AMD doesn’t allow us to have memory stats, i.e. the group MEM is missing due to hardware reasons and not due to an installation issue.