Employing modern C++ for high performance delta-coding compression

Eduardo Madrid

60 minute session
15:30-16:30, Wednesday, 28th June 2023

C++ is the ideal language for tasks such as compressing time-series data with the technique of delta encoding, because of how well it is suited for performance maximizing with the benefits of composability and other software engineering quality benefits.

This presentation will showcase several things:

  1. The application of C++ "tricks" to achieve something similar to "reflection" (introspection) so that users can describe their data layouts and tie them to compression parameters and options.
  2. A good explanation of Delta compression, including prior work by the inventor of the Generic Programming Paradigm, Alexander Stepanov, who, among collaborators, used this technique.
    1. It is possible for some applications, such as dissemination of financial exchange market data, that the compression is fast enough and reduce enough data to be disseminated that pays for itself in reduced latencies!
  3. Outlining columnar databases relying on delta compression
  4. Several counter-intuitive principles about how to micro benchmark very low latency code, typically written in C++.

We will rely on an open source library developed by me to accomplish the topics highlighted.

micro benchmarks
information theory
low latency
posting lists
columnar databases

Eduardo Madrid

High Performance and Generic Programming C++ Software Engineer, with experience in financial technologies and other fields. Presenter several times at users groups and C++ conferences