Tree Compression with Top Trees Revisited
Tree Compression with Top Trees Revisited Documentation

This implementation accompagnies the paper "Tree Compression with Top Trees Revisited" by Rajeev Raman and me (Lorenz Hübschle-Schneider), to be published in the SEA 2015 proceedings. A technical report is available on the arXiv (arXiv:1506.04499 [cs.DS]).

The central part is the implementation of top tree compression, but this repository also contains RePair compression, the generation of random trees, and some utilities around it.

Implementation

Nearly all of the actual code is implemented in the header files (.h). This allows the compiler to perform extensive optimisations. Each of the .cpp files in this folder can be compiled into an executable with GNU Make and the provided Makefile.

The executables are:

A Note on Experiments

Some of the binaries output a line starting with "RESULT" and containing key-value pairs. This line is for parsing with Timo Bingmann's SqlPlotTools and was used to automatically update the plots, tables, and numbers in the paper as experiments were re-run after changes in the code. Don't give copy-paste errors a chance!

Compiling

All algorithms are implemented in C++11 and have been tested with the GNU C++ compiler, g++, in version 4.9 and clang++ in version 3.6. To build one of the above executables including debug assertions (which can cause significant overhead!), the executable name serves as make target, e.g. make coding. A version with debug information and without optimisations for debugging can be compiled by appending Debug to the executable name, while appending NoDebug disables assertions (example: make randomEvalNoDebug). For some executables, a target for profile guided optimisation builds is available as well, this might require changing of XML file paths in the Makefile. The suffix for these is PGO. The binaries will be suffixed with -p, e.g. coding-p.

Notes: clang++ requires the gold linker and linker plugin to use link-time optimisations, which are enabled by default on all non-Debug builds. If this causes problems on your systems, remove -flto from line 6 and all occurences of = in the Makefile.

Prior to version 3.6, the clang++ compiler did not support debug information with the upcoming c++1y standard. If you require support for clang++ 3.4 or 3.5, remove -g from the flags.

Additional make targets are available for static analysis with cppcheck (warning: many false positives, as it does not seem to have, among others, support for C++11 lambdas) and scan-build.

The code was tested and run under Debian GNU/Linux in the unstable distribution as of June 2015, but should work on all Linux-based systems. Adaptation for other operating systems will most likely require a change of paths (e.g. '/tmp'), the makePathRecursive function in Common.h and the drawSvg function in DotGraphExporter.h. Additionally, the Makefile requires adaptation to discover the number of available processors for link-time optimisation. POSIX threads (pthreads) are a requirement for the randomEval and randomVerify commands.