aboutsummaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorMatthias Baumgartner <dev@igsor.net>2023-03-04 17:05:47 +0100
committerMatthias Baumgartner <dev@igsor.net>2023-03-04 17:05:47 +0100
commit87004fa65cc4833cfdbd9a24ba149123c7020edb (patch)
treef7cae71b684b49f2eecd720bda6a2995438b4aa6 /doc
parent4fead04055be4967d9ea3b24ff61fe37a93108dd (diff)
downloadbsfs-87004fa65cc4833cfdbd9a24ba149123c7020edb.tar.gz
bsfs-87004fa65cc4833cfdbd9a24ba149123c7020edb.tar.bz2
bsfs-87004fa65cc4833cfdbd9a24ba149123c7020edb.zip
documentation
Diffstat (limited to 'doc')
-rw-r--r--doc/source/architecture.rst87
-rw-r--r--doc/source/concepts.rst98
-rw-r--r--doc/source/index.rst75
-rw-r--r--doc/source/installation.rst46
4 files changed, 306 insertions, 0 deletions
diff --git a/doc/source/architecture.rst b/doc/source/architecture.rst
new file mode 100644
index 0000000..4cca49a
--- /dev/null
+++ b/doc/source/architecture.rst
@@ -0,0 +1,87 @@
+
+Architecture
+============
+
+The BSFS stack can be coarsely divided into four parts (see the image below).
+
+* Envelope: Essentials and utils used throughout the whole codebase.
+* Front: End-user applications and APIs.
+* Center: The core interfaces and functionality.
+* Back: The triple store backends.
+
+Details of these components are given in the sections below.
+
+
+.. image:: _static/arch_light.png
+ :class: only-light
+
+.. image:: _static/arch_dark.png
+ :class: only-dark
+
+
+Envelope
+--------
+
+Most notably, the envelope covers the :class:`Schema <bsfs.schema.schema.Schema>` and the :mod:`Query syntax trees (AST) <bsfs.query.ast>`.
+Both of them essential for all parts of the BSFS stack.
+For example, the schema is specified by the user via the :func:`Migrate <bsfs.apps.migrate.main>` command, checked and extended by the :class:`Graph <bsfs.graph.graph.Graph>`, and ultimately stored by a :class:`Triple Store backend <bsfs.triple_store.base.TripleStoreBase>`.
+Similarly, the Query AST may be provided by a caller and is translated to a database query by a backend.
+In addition, the envelope also contains some classes to handle URIs:
+:class:`URI <bsfs.utils.uri.URI>` defines the URI base class,
+:class:`Namespace <bsfs.namespace.Namespace>` provides shortcuts to generate URIs, and
+:mod:`UUID <bsfs.utils.uuid>` is used to generate unique URIs.
+
+
+Front
+-----
+
+The front consists of exposed interfaces such as end-user applications or APIs,
+and all utils needed to offer this functionality.
+See :mod:`bsfs.apps` and :mod:`bsfs.front`.
+
+
+Center
+------
+
+The heart of BSFS is grouped around the :mod:`bsfs.graph` module.
+These classes provide the interface to navigate and manipulate the file graph
+in a safe and programmer friendly manner.
+Some of them are indirectly exposed through the higher-level APIs.
+
+The two core design principles of BSFS are the focus on nodes and batch processing.
+They are realized in the the Graph and Nodes classes.
+The :class:`Graph class <bsfs.graph.graph.Graph>` manages the graph as a whole,
+and offers methods to get a specific set of Nodes.
+In turn, the :class:`Nodes class <bsfs.graph.nodes.Nodes>` represents such a set of nodes,
+and performs operations on the whole node set at once.
+Besides, the :mod:`bsfs.graph` module also comes with some syntactic sugar.
+
+Example::
+
+ # Open a file graph.
+ from bsfs import Open, ns
+ graph = Open(...)
+ # Get all nodes of type File.
+ nodes = graph.all(ns.bsfs.File)
+ # Set the author of all nodes at once.
+ nodes.set(ns.bse.author, 'Myself')
+ # Retrieve the author of all nodes at once.
+ set(nodes.get(ns.bse.author, node=False))
+ # Same as above, but shorter.
+ set(nodes.comment(node=False))
+
+
+Back
+----
+
+There are various graph databases (e.g., `RDFLib`_, `Blazegraph`_, `Titan`_, etc.)
+and it would be foolish to replicate the work that others have done.
+Instead, we use third-party stores that take care of how to store and manage the data.
+The :class:`Backend base class <bsfs.triple_store.base.TripleStoreBase>` defines the
+interface to integrate any such third-party store to BSFS.
+Besides storing the data, a triple store backend also need to track the current schema.
+
+
+.. _RDFLib: https://rdflib.readthedocs.io/en/stable/index.html
+.. _Blazegraph: https://blazegraph.com/
+.. _Titan: http://titan.thinkaurelius.com/
diff --git a/doc/source/concepts.rst b/doc/source/concepts.rst
new file mode 100644
index 0000000..9c2ed43
--- /dev/null
+++ b/doc/source/concepts.rst
@@ -0,0 +1,98 @@
+
+Core concepts
+=============
+
+In the following, we present a few core concepts that should help in understanding the BSFS operations and codebase.
+
+
+Graph storage
+-------------
+
+`RDF`_ describes a network or graph like the file graph as a set of
+*(subject, predicate, object)* triples.
+*Subject* is the identifier of the source node,
+*object* is the identifier of the target node (or a literal value),
+and *predicate* is the type of relation between the source node and the target.
+As suggested by `RDF`_, we use URIs to identify nodes and predicates.
+For example, a triple that assigns me as the author of a file could look like this::
+
+ <http://example.com/file#1234> <https://bsfs.io/schema/Entity#author> <http://example.com/me>
+
+Note that alternatively, the *object* could also be a literal value ("me")::
+
+ <http://example.com/file#1234> <https://bsfs.io/schema/Entity#author> "me"
+
+There are a number of graph databases that support this or an analoguous paradigm,
+such as `RDFLib`_, `Blazegraph`_, `TypeDB`_, `Titan`_,
+and `many more <https://en.wikipedia.org/wiki/Graph_database#List_of_graph_databases>`_.
+BSFS uses such a third-party graph database to store its file graph.
+
+As usual in database systems,
+we have to distinguish schema data (that coverns the structure of the storage)
+from instance data (the actual database content).
+Similar to relational database systems,
+both kinds of data can be represented as triples,
+and subsequently stored within the same graph storage
+(although one might need to separate them logically).
+In BSFS, we employ an explicit schema (see next section) that is managed alongside the data.
+
+
+
+Schema
+------
+
+BSFS ensures consistency across multiple distributed client applications
+by maintaining an explicit schema that governs node types and predicates.
+Furthermore, exposing the schema allows client to run a number of compatibility and validity checks
+locally, and a graph database may use the schema to optimize its storage or operations.
+
+In BSFS, the schema is initially provided by the system administrator
+(usually in the `Turtle`_ format)
+and subsequently stored by the backend.
+The default schema defines three root types
+(``bsfs:Node``, ``bsfs:Predicate``, and ``bsfs:Literal``),
+and BSFS expects any node, literal, or predicate to be derived from these roots.
+
+For example, a new predicate can be defined like so::
+
+ # define some abbreviations
+ prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
+ prefix bsfs: <http://schema.bsfs.io/>
+ prefix bse: <http://schema.bsfs.io/Entity#>
+
+ # define a node type
+ bsfs:Entity rdfs:subClassOf bsfs:Node .
+
+ # define a literal type
+ xsd:string rdfs:subClassOf bsfs:Literal .
+
+ # define a predicate ("author of a node")
+ bse:author rdfs:subClassOf bsfs:Predicate ;
+ rdfs:domain bsfs:Entity ;
+ rdfs:range xsd:string .
+
+BSFS checks all requests and rejects queries or operations that violate the schema.
+
+
+Querying
+--------
+
+BSFS at its core is not much more than a translator from a user query into a graph database query.
+It operates directly on three abstract syntax trees (AST),
+to run fetch, search, or sort, queries respectively.
+By not using an existing query language,
+we avoid an unnecessary and possibly expensive parsing step.
+Some routines create an AST internally (e.g., :func:`bsfs.graph.graph.Graph.all`),
+others accept an user-defined AST (e.g., :func:`bsfs.graph.graph.Graph.get`).
+One way or another, the AST is validated against the schema,
+and access control conditions are added.
+
+
+.. _RDF: https://www.w3.org/RDF/
+.. _RDFLib: https://rdflib.readthedocs.io/en/stable/index.html
+.. _Blazegraph: https://blazegraph.com/
+.. _Titan: http://titan.thinkaurelius.com/
+.. _TypeDB: https://vaticle.com/
+.. _Turtle: https://www.w3.org/TR/turtle/
+
+
diff --git a/doc/source/index.rst b/doc/source/index.rst
new file mode 100644
index 0000000..91d53f6
--- /dev/null
+++ b/doc/source/index.rst
@@ -0,0 +1,75 @@
+
+The Black Star File System
+==========================
+
+A file system has two roles: It has to specify how to write files to a medium, and it has to define how a user can access files.
+Most file systems focus on the first role and adopt the standard directory tree approach for the second role.
+It is of course necessary to solve the challenges of medium access, but we should not neglect the user's perspective.
+As a user, I mostly care about how how conveniently I can organize my data, and quickly I can access relevant information.
+The hierarchical approach is rather restrictive in this regard:
+You can only organize files in a directory tree [#f1]_, and search tasks often require third-party tools like `find`_ or `locate`_.
+
+Tagging file systems proposed an alternative file organization model.
+Instead of a placing files in directories, they assign one or more (user-defined) tags to each file.
+This increases the flexibility over a hierarchical data model,
+because you can group any combination of files, and each file can be a part of various groups.
+Semantic file systems push this idea one step further by trying to understand
+the data they're dealing with.
+For example, files can be grouped by their data type (documents), file format (odt),
+author (yourself), topic (information management), etc.
+The benefit for the user is that they can browse their files by association rather than by location --- similar to how we nagivate the Web.
+
+Clearly, the hierarchical approach is insufficient to organize this variety of information.
+Instead, we need a network of files,
+where they can be connected to each other, their properties, or to auxiliary nodes
+(such as tags, collections, etc.) under a given relationship.
+We call this the file graph.
+With the *Black Star File System (BSFS)*, you can store, manage, and query such a file graph.
+
+..
+ TODO: Clarify
+ * Different relationships
+ * Properties and auxiliary nodes
+
+ TODO: File graph image
+ TODO: SFS/TFS references
+
+ TODO: BSFS features
+ Within BSFS, you can store the file content, file metadata,
+ and content-derived information (e.g., features) alike.
+
+ Within the file graph, we link files directly,
+ through properties, or through intermediate nodes.
+
+The Black Star File System is designed with three query patterns in mind:
+navigation, search, and browsing.
+
+The **navigation** pattern describes the case when the user knows exactly what they want,
+and they already have an address or id of the target file.
+BSFS identifies each file with a unique URI,
+or you can quickly navigate to a file via its name or other file properties.
+
+A **search** occurs when the user lacks the specific address or identifier to a target file,
+but they have relatively clear and narrow search criteria.
+With BSFS, you can search by file properties (name, size), content (keywords, features),
+or associations to other files and auxiliary nodes (tags, collections).
+
+**Browsing** takes place when the user has only vague query criteria but wants to quickly scan and compare many files.
+In BSFS, you can browse along file associations and rank results by a variety of similarity metrics.
+
+.. toctree::
+ :maxdepth: 1
+
+ installation
+ concepts
+ architecture
+ api/modules
+
+
+.. [#f1] although links and similar techniques allow some deviation from this principle
+
+.. _find: https://www.gnu.org/software/findutils/manual/html_node/find_html/Invoking-find.html#Invoking-find
+
+.. _locate: https://www.gnu.org/software/findutils/manual/html_node/find_html/Invoking-locate.html
+
+
diff --git a/doc/source/installation.rst b/doc/source/installation.rst
new file mode 100644
index 0000000..c7d8fba
--- /dev/null
+++ b/doc/source/installation.rst
@@ -0,0 +1,46 @@
+
+Installation
+============
+
+Installation
+------------
+
+Install *BSFS* via pip::
+
+ pip install --extra-index-url https://pip.bsfs.io bsfs
+
+This installs the `bsfs` python package as well as the `bsfs.app` command.
+It is recommended to install *bsfs* in a virtual environment (via `virtualenv`).
+
+
+License
+-------
+
+This project is released under the terms of the 3-clause BSD License.
+By downloading or using the application you agree to the license's terms and conditions.
+
+.. literalinclude:: ../../LICENSE
+
+
+Source
+------
+
+Check out our git repository::
+
+ git clone https://git.bsfs.io/bsfs.git
+
+You can further install *bsfs* via the ususal `setuptools <https://setuptools.pypa.io/en/latest/index.html>`_ commands from your bsfs source directory::
+
+ python setup.py develop
+
+For development, you also need to install some additional dependencies::
+
+ # code style discipline
+ pip install mypy coverage pylint
+
+ # documentation
+ pip install sphinx sphinx-copybutton furo
+
+ # packaging
+ pip install build
+