With easy to use Pythonic syntax:

The following function call provides similar, easy to use syntactic sugar we find in Python: arguments may be specified in arbitrary order... The actual functionality is provided by `libhdf5` H5CPP simplifies the otherwise complex HDF5 C library usage

    h5::ds_t h5::create( const h5::fd_t& fd, const std::string& dataset_name,
        h5::current_dims{n,m,...},
        h5::max_dims{i,k, ..}, // j > n, k > m ... or H5S_UNLIMITED
        h5::chunk{1,4,5} | h5::deflate{4} | h5::compact | h5::dont_filter_partial_chunks
            | h5::fill_value{STR} | h5::fill_time_never | h5::alloc_time_early 
            | h5::fletcher32 | h5::shuffle | h5::nbit,
        h5::chunk_cache{...} | h5::all_coll_metadata_ops{...} | ... 
    );
    
  • meta-programming based variable ranks
  • optional, daisy chained flags for fine tuning
  • fine tuning internals
  • argument may be in arbitrary order

Learn what you need now, but have ample of room to refine/perfect later...

An EXAMPLE for event recording pipeline

    #include <h5cpp/core>
        #include "generated.h"
    #include <h5cpp/io>
    int main(){
        h5::fd_t fd = h5::create("some-file.h5",H5F_ACC_TRUNC);
        ...
        try {
            h5::pt_t pt = h5::create<sn::example::complicated_struct_t>(fd,
                "/internal/path/to/dataset", h5::max_dims{H5S_UNLIMITED}, h5::chunk{1024} );
            sn::example::complicated_struct_t event;
            for(size_t i=0; i<size; i++ )
                h5::append(pt, event);
        } catch ( const h5::error::any& e ){ ... }
    }
    
  • we are to record a sequence of events
  • with very complicated layout
  • into some file
  • and and indexable dataset within
  • with good IO properties
  • making sure things don't go wild...

The actual introspection

    #include <h5cpp/core>
    	#include "generated.h"
    #include <h5cpp/io>
    int main(){
        h5::fd_t fd = h5::create("lightning-talk-example.h5",H5F_ACC_TRUNC);
        h5::pt_t pt = h5::create(fd, "stream",
            h5::max_dims{H5S_UNLIMITED}, h5::chunk{1024} );
        ...
        try {
            h5::pt_t pt = h5::create<sn::example::complicated_struct_t>(fd,
                "/inernal/path/to/dataset", h5::max_dims{H5S_UNLIMITED}, h5::chunk{1024} );
            sn::example::complicated_struct_t event;
            for(size_t i=0; i<size; i++ )
                h5::append(pt, event);
        } catch ( const h5::error::any& e ){ ... }
    }
    
  • notice the included 'generated.h'
  • as the name suggest: it doesn't exist yet...

A header file with HDF5 Compound Type descriptors:

#ifndef H5CPP_GUARD_ErRrk
    #define H5CPP_GUARD_ErRrk
    namespace h5{
        template<> hid_t inline register_struct(){
            hsize_t at_00_[] ={7};            hid_t at_00 = H5Tarray_create(H5T_NATIVE_FLOAT,1,at_00_);
            hsize_t at_01_[] ={3};            hid_t at_01 = H5Tarray_create(H5T_NATIVE_DOUBLE,1,at_01_);
            hid_t ct_00 = H5Tcreate(H5T_COMPOUND, sizeof (sn::typecheck::Record));
            H5Tinsert(ct_00, "_char",	HOFFSET(sn::typecheck::Record,_char),H5T_NATIVE_CHAR);
    		...
    		H5Tclose(at_03); H5Tclose(at_04); H5Tclose(at_05); 
            return ct_02;
        };
    }
    H5CPP_REGISTER_STRUCT(sn::example::Record);
    #endif
    
  • random include guards
  • within namespace
  • template specialization for h5::operators
  • HDF5 compound types are recursively created
  • calls the template specialization when h5::operator needs it
# How far does it go in terms of complexity?

Fairly complex POD struct

namespace typecheck {
        struct struct_t { /*the types with direct mapping to HDF5*/
            char  _char; unsigned char _uchar; short _short; unsigned short _ushort; int _int; unsigned int _uint;
            long _long; unsigned long _ulong; long long int _llong; unsigned long long _ullong;
            float _float; double _double; long double _ldouble;
            bool _bool;
            // wide characters are not supported in HDF5
            // wchar_t _wchar; char16_t _wchar16; char32_t _wchar32;
        };
    }
    namespace other {
        struct struct_t {                    // POD struct with nested namespace
            type_alias_t                idx; // typedef type 
            double              field_02[3]; // const array mapped 
            typecheck::struct_t field_03[4]; //
        };
    }
    namespace example {
        struct complicated_struct_t {        // POD struct with nested namespace
            type_alias_t                idx; // typedef type 
            float                 array[7];  // array of elementary types
            sn::other::struct_t field_04[5]; // embedded struct 1D array
            other::struct_t  field_05[3][8]; // array of arrays 
        };
    }
    

IMAS fusion reactor data schema

  • 1000's of classes...
  • linear combination of them: think of arrays ...
  • graphs: references from one field to other record

No Problem...

Anatomy of Persistence

#include <some_header_files>
    int main(int argc, char *argv[]) {
       sn::some_type_t object;
       write( somewhere, object, ... );
       ...
       for( size_t i=0; i<huge_number; i+=batch_size)
          read( somewhere, object, ...);
    }
  • take a program
  • with an object, having some memory layout
  • and intention to save its state to some device
  • or retrieve the data

Properties of the Object

can be categorized by

  • Content: homogeneous vs heterogeneous
  • Placement in memory: Contiguous vs Non contiguous
  • How much space is used in total
  • Number of dimensions or rank

Type trait utility?

  • std::is_compound != heterogeneous
  • std::vector<int>().data() points to a contiguous memory, but when T = std::string it doesn't

purely trait based approach requires the type available upfront, making it less powerful then if we could detect the presence of certain methods

heterogeneous types   such as plain old struct, class, union...

require access to field names and may reside in non-contiguous memory.    possible layouts:

Table approach, where each row is a heterogeneous datatype leads to fast indexing by rows.
OTOH accessing a single field will lead to I/O bandwidth loss.
Contiguous memory: 1 block heterogeneous
Column layout provides efficient access by fields, with the added implementation complexity of each dataset per columns
non-contiguous memory: 3 separate blocks, each homogeneous
 //a vector of pod struct
    struct coo_t {
       size_t row;
       size_t column;
       double value;
    };
    std::vector<coo_t> sparse_matrix;
    
 // each field of the struct is a vector
    struct csc_t {
       std::vector<size_t> rowind; // row indices
       std::vector<size_t> colptr; // start of new columns
       std::vector<double> values; // nonzero values
    };
    csc_t sparse_matrix;
    
NOTE: coordinate of points   is efficient for incremental construction. Whereas Armadillo C++ arma::SpMat<T> uses Compressed Sparse Column representation

We need Introspection / Reflection method to retrieve field names of C++ class types

C++ mailing list and papers categorized as related to reflection
  • 2019 8 entries
    • P1390R0 Suggested Reflection TS NB Resolutions Matúš Chochlík, Axel Naumann, and David Sankel
    • P1390R1 Reflection TS NB comment resolutions: summary and rationale Matúš Chochlík, Axel Naumann, and David Sankel
    • N4818 Working Draft, C++ Extensions for Reflection David Sankel
    • P1733R0 User-friendly and Evolution-friendly Reflection: A Compromise David Sankel, Daveed Vandevoorde
    • P1749R0 Access control for reflection Yehezkel Bernat
    • P1240R1 Scalable Reflection in C++ Daveed Vandevoorde, Wyatt Childers, Andrew Sutton, Faisal Vali, Daveed Vandevoorde
    • P1887R0 Typesafe Reflection on attributes Corentin Jabot
  • 2018 13 entries
    • P0194R5 Static reflection by Matúš Chochlík, Axel Naumann, David Sankel
    • P0670R2 Static reflection of functions by Matúš Chochlík, Axel Naumann, David Sankel
    • P0954R0 What do we want to do with reflection? Bjarne Stroustrup
    • P0993R0 Value-based Reflection Andrew Sutton, Herb Sutter
    • P0572R2 Static reflection of bit fields Alex Christensen
    • P0670R3 Function reflection Matúš Chochlík, Axel Naumann, David Sankel
    • P1240R0 Scalable Reflection in C++ Andrew Sutton, Faisal Vali, Daveed Vandevoorde
  • 2017 14 entries
    • P0194R3 Static reflection by Matúš Chochlík, Axel Naumann, David Sankel
    • P0385R2 Static reflection: Rationale, design and evolution by Matúš Chochlík, Axel Naumann, David Sankel
    • P0578R0 Static Reflection in a Nutshell by Matúš Chochlík, Axel Naumann, David Sankel
    • P0590R0 A design static reflection: Andrew Sutton, Herb Sutter
    • P0598R0 Reflect Through Values Instead of Types by Daveed Vandevoorde
  • 2016 19 entries
    • P0194R0 Static reflection (revision 4) Matus Chochlik, Axel Naumann
    • P0255R0 C++ Static Reflection via template pack expansion Cleiton Santoia Silva, Daniel Auresco
    • P0256R0 C++ Reflection Light Cleiton Santoia Silva
    • P0327R0 Product types access Vicente J. Botet Escriba
    • P0341R0 parameter packs outside of templates Mike Spertus
    • Static reflection: Rationale, design and evolution Matúš Chochlík, Alex Naumann
introspection/reflection is non-trivial, not yet available as a language feature

LLVM/CLANG lib tooling based static reflection

...
    StatementMatcher h5templateMatcher = callExpr( allOf(
       hasDescendant( declRefExpr( to( varDecl().bind("variableDecl")  ) ) ),
       hasDescendant( declRefExpr( to(
          functionDecl( allOf(
            eachOf(
    		hasName("h5::write"), hasName("h5::create"), hasName("h5::read"),
                hasName("h5::append"),
    		hasName("h5::awrite"), hasName("h5::acreate"), hasName("h5::aread")
    	),
    ... ));
  • identify the relevant nodes
  • marked by I/O operators
  • visit the structure in reverse topological order
  • emit the templates describing the class with fields and types
P0993r0 Value-based Reflection, Andrew Sutton, Herb Sutter:
static reflection is a programming facility that exposes read-only data about entities in a translation unit compile-time values. Static reflection does not require support for runtime compilation since reflection values can be used with existing generative facilities (i.e., templates) or additional generative facilities (i.e., metaprogramming) to produce new code.
dynamic reflection:provides information for navigating source-code data structures at runtime. Language supporting dynamic reflection also tend to make additional facilities available for generating and JIT-compiling new code. Dynamic reflection and code generation are not in the scope of this work.

How about   Containers?   Let's take a look at N4436 C++ Detection Idiom

It is possible to identify if a container is STL like, provides direct access to its contiguous storage -- as std::vector<T> does, or alternatively iterators for scatter/gather operations

  • identify if there is direct access to contiguous memory
  • or iterator for non-contiguous layouts

    template <typename T> using value_type_f = typename T::value_type;
    template <typename T> using data_f = decltype(std::declval <T>().data());
    template <typename T> using size_f = decltype(std::declval <T>().size());
    template <typename T> using begin_f = decltype(std::declval <T>().begin());
    template <typename T> using end_f = decltype(std::declval <T>().end());
    template <typename T> using cbegin_f = decltype(std::declval <T>().cbegin());
    template <typename T> using cend_f = decltype(std::declval <T>().cend());

    template <typename T> using value = compat::detected_or <T, value_type_f, T>;
    template <typename T> using has_value_type = compat::is_detected <value_type_f, T>;
    template <typename T> using has_data = compat::is_detected <data_f, T>;
    template <typename T> using has_direct_access = compat::is_detected <data_f, T>;
    template <typename T> using has_size = compat::is_detected <size_f, T>;
    template <typename T> using has_begin = compat::is_detected <begin_f, T>;
    template <typename T> using has_end = compat::is_detected <end_f, T>;
    template <typename T> using has_cbegin = compat::is_detected <cbegin_f, T>;
    template <typename T> using has_cend = compat::is_detected <cend_f, T>;

    template <typename T> using has_iterator = std::integral_constant <bool, has_begin <T>::value && has_end <T>::value >;
    template <typename T> using has_const_iterator = std::integral_constant <bool, has_cbegin <T>::value && has_cend <T>::value >;
    
credit: WG21 N4436 C++ Detection Idiom by Walter Brown

C++ Linear Algebra Systems calling BLAS/LAPACK    specialized containers

are dedicated category, as they all must provide mechanism to pass/receive data to/from some BLAS system call, however the naming varies from system to system.

The differences can be mitigated with a combination of

  • type traits
  • feature detection idiom

librarydirect accessvector size
armamemptr()n_elem
eigendata()size()
blazedata()n/a
blitzdata()size()
itpp_data()length()
ublasdata().begin()n/a
dlib(0,0)size()

H5CPP

  • LLVM based static reflection tool
  • C++ templates with CRUD like operators

take a header file with POD struct


typedef unsigned long long int MyUInt;
namespace sn {
	namespace example {
		struct Record {
			MyUInt               field_01;
			char                 field_02;
			double            field_03[3];
			other::Record field_04[4];
		};
	}
}
  • typedefs are fine
  • nested namespace are OK
  • mapped to : H5T_NATIVE_CHAR
  • H5Tarray_create(H5T_NATIVE_DOUBLE,1, ... )
  • first `other::Record` is parsed: type_hid_t = ...
  • then the generated type is used: H5Tarray_create(type_hid_t, ...)

write your program

write your cpp program as if `generated.h` were already written 
#include "some_header_file.h"
#include <h5cpp/core>
	#include "generated.h"
#include <h5cpp/io>
int main(){
	std::vector<sn::example::Record> stream =
		...
	h5::fd_t fd = h5::create("example.h5",H5F_ACC_TRUNC);
	h5::pt_t pt = h5::create<sn::example::Record>(
		fd, "stream of struct",
		h5::max_dims{H5S_UNLIMITED,7}, h5::chunk{4,7} | h5::gzip{9} );
	...
}
  • sandwich the not-yet existing `generated.h`
  • write the TU translation unit as usual
  • using the POD type with one of the H5CPP CRUD like operators:
      h5::create | h5::write | h5::read | h5::append | h5::acreate | h5::awrite | h5::aread  
    will trigger the `h5cpp` compiler to generate code

A header file with HDF5 Compound Type descriptors:

#ifndef H5CPP_GUARD_ErRrk
#define H5CPP_GUARD_ErRrk
namespace h5{
    template<> hid_t inline register_struct(){
        hsize_t at_00_[] ={7};            hid_t at_00 = H5Tarray_create(H5T_NATIVE_FLOAT,1,at_00_);
        hsize_t at_01_[] ={3};            hid_t at_01 = H5Tarray_create(H5T_NATIVE_DOUBLE,1,at_01_);
        hid_t ct_00 = H5Tcreate(H5T_COMPOUND, sizeof (sn::typecheck::Record));
        H5Tinsert(ct_00, "_char",	HOFFSET(sn::typecheck::Record,_char),H5T_NATIVE_CHAR);
		...
		H5Tclose(at_03); H5Tclose(at_04); H5Tclose(at_05); 
        return ct_02;
    };
}
H5CPP_REGISTER_STRUCT(sn::example::Record);
#endif
  • random include guards
  • within namespace
  • template specialization for h5::operators
  • compound types are recursively created
  • calls the template specialization when h5::operator needs it
rich set of
HDF5
property lists

Comma Separated Values to HDF5

#include "csv.h"
#include "struct.h"
#include <h5cpp/core>      // has handle + type descriptors
	#include "generated.h" // uses type descriptors
#include <h5cpp/io>        // uses generated.h + core 

int main(){
	h5::fd_t fd = h5::create("output.h5",H5F_ACC_TRUNC);
	h5::ds_t ds = h5::create<input_t>(fd,  "simple approach/dataset.csv",
				 h5::max_dims{H5S_UNLIMITED}, h5::chunk{10} | h5::gzip{9} );
	h5::pt_t pt = ds;
	ds["data set"] = "monroe-county-crash-data2003-to-2015.csv";
	ds["cvs parser"] = "https://github.com/ben-strasser/fast-cpp-csv-parser";

	constexpr unsigned N_COLS = 5;
	io::CSVReader<N_COLS> in("input.csv"); // number of cols may be less, than total columns in a row, we're to read only 5
	in.read_header(io::ignore_extra_column, "Master Record Number", "Hour", "Reported_Location","Latitude","Longitude");
	input_t row;                           // buffer to read line by line
	char* ptr;      // indirection, as `read_row` doesn't take array directly
	while(in.read_row(row.MasterRecordNumber, row.Hour, ptr, row.Latitude, row.Longitude)){
		strncpy(row.ReportedLocation, ptr, STR_ARRAY_SIZE); // defined in struct.h
		h5::append(pt, row);
	}
}
  • CSV header only library by Ben Strasser, and a type definition for the record
  • h5cpp includes
  • translation unit, the program
  • create HDF5 container, and dataset
  • decorate it with attributes
  • do I/O operations within a loop

Attributes:

do the right thing. Here are some examples, and come with an easy to use operator:

h5::ds_t ds = h5::write(fd,"some dataset with attributes", ... );
ds["att_01"] = 42 ;
ds["att_02"] = {1.,3.,4.,5.};
ds["att_03"] = {'1','3','4','5'};
ds["att_04"] = {"alpha", "beta","gamma","..."};
ds["att_05"] = "const char[N]";
ds["att_06"] = u8"const char[N]áééé";
ds["att_07"] = std::string( "std::string");
ds["att_08"] = record; // pod/compound datatype
ds["att_09"] = vector; // vector of pod/compound type
ds["att_10"] = matrix; // linear algebra object
  • obtain a handle by h5::create | h5::open | h5::write
  • rank N objects, even compound types when h5cpp compiler used
  • arrays of various element types
  • mapped to rank 0 variable length character types

FAQ

  • what is H5CPP: saves and loads data very fast
  • who uses H5CPP and or HDF5:
    • Research institutions such as Fermi National Laboratory
    • FIN tech: Algorithmic trading, quantitative analysis
    • IoT (Internet Of Things): image frames, sensor networks, etc...
    • Research labs in machine learning, computer vision
    • Pharmaceuticals: see [Allotrope foundation](https://www.allotrope.org/)
  • What sort of data can be saved?
    • data: scalars, vectors, matrices, N dimensional hypercubes
    • text
  • How is it different from SQL, No-SQL, etc...: HDF5 is much faster, much lower latency, it scales to supercomputers
  • Is the format new? never heard of it : It has been around since the early 90's founded by NASA. It is like salt, you only notice when it is missing. All statistical platforms can save data in HDF5
  • How can it help me, my team, my company? H5CPP makes things simpler, without compromise. If interested in details Steve is here to help...
  • How does it work? It is a technical question, Steve will take it from here...
  • How can I contact you? everything is on http://H5CPP.org
  • much does it cost? It is free to use including commercial applications!
  • Is The HDFGroup present: Gerd Heber will be here from 20:45CEST