The gCube System - Storage Layer
------------------------------------------------------------

This work is partially funded by the European Commission in the
context of the D4Science project (www.d4science.eu), under the 1st call of FP7 IST priority.

Authors
-------

* Michael Springmann (michael.springmann@unibas.ch) Departement Informatik, Universität Basel
* Diego Milano (diego.milano@unibas.ch) Departement Informatik, Universität Basel
* Federico De Faveri (federico.defaveri@isti.cnr.it), CNR Pisa, Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo"

Version and Release Date
------------------------

v. 1.3.2

Description
-----------
This library come out from the Content Management Library splitting.
There an old description from Diligent project:

BASE LAYER
DILIGENT Content Management will rest upon a subset of gLite storage services. 
Namely, gLite Storage Elements (SE), e.g. DPM, and file transfer service for 
relocating physical files across the Grid will be used. In addition, we 
introduced an alternative representation of file content as BLOB fields into 
our storage model to meet our performance and functionality demands. 

Storage operations may be parameterized to choose between gLite-based and 
BLOB-based file storage. In general, gLite storage facilities should be 
favored over the BLOB representation to (1) take advantage of all benefits 
that automatically come with Grid storage (distribution features, replication, 
support for fancy storage hardware etc.) and (2) make DILIGENT content 
accessible for non-DILIGENT software that interacts with the gLite middleware. 
We are positive, that current performance restrictions of gLite which limit its 
applicability for interactive usage will be gradually improved in subsequent 
releases. The proposed database-backed storage comes with a few restrictions, 
which are important to memorize. 

Those are (1) "fat" nodes in terms of the required middleware (i.e. database 
systems) to be installed on each Content/Storage Management node and (2) 
possibly dedicated hardware requirements to cope with the vast amount of data. 
Moreover, (3) database security mechanisms have to fit in the role of the 
credentials-based gLite file storage.

STORAGE LAYER
Storage Management Layer builds on top of the Base Layer. In detail, every 
storage node will host (at least) one database instance.  
Each information object may comprise a list of storage properties represented 
as simple key-type-value associations. Those storage properties are atomic 
whereas complex metadata (like indexes, multimedia features) may also be 
represented as separate information object that is associated to the object. 
Plain tags should be favored, whenever querying is a requirement or 
attribute-level access to certain metadata fields shall be provided 
within the storage manager.

An information object may be an abstraction from a file-based document 
(which is stored in gLite SEs), a BLOB-field representation of content, 
a simple document with no associated content (besides storage properties 
and/or object references), and a complex object that references to other 
objects. Notice, that complex documents are permitted to have content 
(file, BLOB) associated to it. The storage manager does not maintain 
consistency of the document content and explicit references to other 
objects. For instance, an HTML document that includes a number of 
images may be modeled as a complex object that provides references to 
information objects (containing the images). Alternatively, any XML 
(also HTML) document can be made persistent within storage management 
without having the complete XML file associated to the respective 
information object as binary content. To do so, one can exploit the 
relationship attributes on role (of the relationship), position (of the 
contained object within the containing object), and cardinality. The 
advantage of having no duplicate association information (implicit to 
Storage Management in the XML file and explicit as persistent object 
relationship) comes at the price of providing parsers for XML documents 
(and possibly also other container formats) that break up XML structures 
into their parts. 

An object reference "links" two information objects. Each object may (1) 
reference many other objects and (2) may be referenced by many objects 
(m-n relationship). To instantiate on this generic object reference, role names, 
reference types, a position tag (to reconstruct a view of the referring object 
by composing it out of its referenced objects), and a delete propagation rule 
(consulted upon removal of the referring object to determine whether to 
automatically delete the referred object) attribute the link. For instance, a 
reference type may be "indexes" with a role name that gives additional information, 
like "full-text index". In addition, a positioning attribute helps in representing 
an object that references to other objects, like an aggregate made up of components 
that have to be fitted together in a certain order. Propagate role provides removal 
constraints to the storage manager (i.e. the Storage Management service). 
In detail, different reference types may impose different rules when deleting a 
referenced object. For instance, component objects (appearing as collection members) 
will not be deleted if the collection is removed. In contrast, an associated index 
will be removed.

Download information
--------------------

Source code is available from SVN:
http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/content-management/StorageLayer

Binaries can be downloaded from:


Documentation 
-------------
Documentation is available on-line from the Projects Documentation Wiki:
https://gcube.wiki.gcube-system.org/gcube/index.php/Storage_Management

Test-suite documentation is available on-line from the Projects Documentation Wiki:


Licensing
---------

This software is licensed under the terms you may find in the file named "LICENSE" in this directory.
