Big Data/CumulusRDF

CumulusRDF is a distributed RDF store that stores the RDF triples in the key-value store Apache Cassandra.

Each RDF triple consists of a subject (S), a property (P) and an object (O). Several RDF triple form a graph where P is an labelled directed edge starting at S and leading to O. The graph can be queried by using eight basic graph pattern (BGP). In order to answer these queries efficiently three indices are provided by CumulusRDF: SPO, POS and OSP. The different BGPs and which index is required to answer them are shown in the following table:

CumulusRDF supports two storage representations for RDF triples of the form (s, p, o):
 * Hierarchical Layout
 * { s : { p : { o : - } } }, { o : { s : { p : - } } } and { p : { o : { s : - } } } are inserted.


 * Flat Layout
 * { s : { po : - } }, { o : { sp : - } }, { po : { s : - } } and { po : { 'p' : p } } are inserted.
 * The third key-key-value triple is required since Apache Cassandra stores all triples with the same key on the same data node. Since some property like rdf:type are used very often this would lead to an unbalanced load distribution. Therefore, the property concatenated with the object is used as key.
 * In order to find all triples with the same property, i.e., (?p?) the fourth triple is required. It is used in a secondary index that maps all values p to all keys (po) in which this value occurs.