diff options
author | Matthias Baumgartner <dev@igsor.net> | 2023-03-04 17:05:47 +0100 |
---|---|---|
committer | Matthias Baumgartner <dev@igsor.net> | 2023-03-04 17:05:47 +0100 |
commit | 87004fa65cc4833cfdbd9a24ba149123c7020edb (patch) | |
tree | f7cae71b684b49f2eecd720bda6a2995438b4aa6 /doc | |
parent | 4fead04055be4967d9ea3b24ff61fe37a93108dd (diff) | |
download | bsfs-87004fa65cc4833cfdbd9a24ba149123c7020edb.tar.gz bsfs-87004fa65cc4833cfdbd9a24ba149123c7020edb.tar.bz2 bsfs-87004fa65cc4833cfdbd9a24ba149123c7020edb.zip |
documentation
Diffstat (limited to 'doc')
-rw-r--r-- | doc/source/architecture.rst | 87 | ||||
-rw-r--r-- | doc/source/concepts.rst | 98 | ||||
-rw-r--r-- | doc/source/index.rst | 75 | ||||
-rw-r--r-- | doc/source/installation.rst | 46 |
4 files changed, 306 insertions, 0 deletions
diff --git a/doc/source/architecture.rst b/doc/source/architecture.rst new file mode 100644 index 0000000..4cca49a --- /dev/null +++ b/doc/source/architecture.rst @@ -0,0 +1,87 @@ + +Architecture +============ + +The BSFS stack can be coarsely divided into four parts (see the image below). + +* Envelope: Essentials and utils used throughout the whole codebase. +* Front: End-user applications and APIs. +* Center: The core interfaces and functionality. +* Back: The triple store backends. + +Details of these components are given in the sections below. + + +.. image:: _static/arch_light.png + :class: only-light + +.. image:: _static/arch_dark.png + :class: only-dark + + +Envelope +-------- + +Most notably, the envelope covers the :class:`Schema <bsfs.schema.schema.Schema>` and the :mod:`Query syntax trees (AST) <bsfs.query.ast>`. +Both of them essential for all parts of the BSFS stack. +For example, the schema is specified by the user via the :func:`Migrate <bsfs.apps.migrate.main>` command, checked and extended by the :class:`Graph <bsfs.graph.graph.Graph>`, and ultimately stored by a :class:`Triple Store backend <bsfs.triple_store.base.TripleStoreBase>`. +Similarly, the Query AST may be provided by a caller and is translated to a database query by a backend. +In addition, the envelope also contains some classes to handle URIs: +:class:`URI <bsfs.utils.uri.URI>` defines the URI base class, +:class:`Namespace <bsfs.namespace.Namespace>` provides shortcuts to generate URIs, and +:mod:`UUID <bsfs.utils.uuid>` is used to generate unique URIs. + + +Front +----- + +The front consists of exposed interfaces such as end-user applications or APIs, +and all utils needed to offer this functionality. +See :mod:`bsfs.apps` and :mod:`bsfs.front`. + + +Center +------ + +The heart of BSFS is grouped around the :mod:`bsfs.graph` module. +These classes provide the interface to navigate and manipulate the file graph +in a safe and programmer friendly manner. +Some of them are indirectly exposed through the higher-level APIs. + +The two core design principles of BSFS are the focus on nodes and batch processing. +They are realized in the the Graph and Nodes classes. +The :class:`Graph class <bsfs.graph.graph.Graph>` manages the graph as a whole, +and offers methods to get a specific set of Nodes. +In turn, the :class:`Nodes class <bsfs.graph.nodes.Nodes>` represents such a set of nodes, +and performs operations on the whole node set at once. +Besides, the :mod:`bsfs.graph` module also comes with some syntactic sugar. + +Example:: + + # Open a file graph. + from bsfs import Open, ns + graph = Open(...) + # Get all nodes of type File. + nodes = graph.all(ns.bsfs.File) + # Set the author of all nodes at once. + nodes.set(ns.bse.author, 'Myself') + # Retrieve the author of all nodes at once. + set(nodes.get(ns.bse.author, node=False)) + # Same as above, but shorter. + set(nodes.comment(node=False)) + + +Back +---- + +There are various graph databases (e.g., `RDFLib`_, `Blazegraph`_, `Titan`_, etc.) +and it would be foolish to replicate the work that others have done. +Instead, we use third-party stores that take care of how to store and manage the data. +The :class:`Backend base class <bsfs.triple_store.base.TripleStoreBase>` defines the +interface to integrate any such third-party store to BSFS. +Besides storing the data, a triple store backend also need to track the current schema. + + +.. _RDFLib: https://rdflib.readthedocs.io/en/stable/index.html +.. _Blazegraph: https://blazegraph.com/ +.. _Titan: http://titan.thinkaurelius.com/ diff --git a/doc/source/concepts.rst b/doc/source/concepts.rst new file mode 100644 index 0000000..9c2ed43 --- /dev/null +++ b/doc/source/concepts.rst @@ -0,0 +1,98 @@ + +Core concepts +============= + +In the following, we present a few core concepts that should help in understanding the BSFS operations and codebase. + + +Graph storage +------------- + +`RDF`_ describes a network or graph like the file graph as a set of +*(subject, predicate, object)* triples. +*Subject* is the identifier of the source node, +*object* is the identifier of the target node (or a literal value), +and *predicate* is the type of relation between the source node and the target. +As suggested by `RDF`_, we use URIs to identify nodes and predicates. +For example, a triple that assigns me as the author of a file could look like this:: + + <http://example.com/file#1234> <https://bsfs.io/schema/Entity#author> <http://example.com/me> + +Note that alternatively, the *object* could also be a literal value ("me"):: + + <http://example.com/file#1234> <https://bsfs.io/schema/Entity#author> "me" + +There are a number of graph databases that support this or an analoguous paradigm, +such as `RDFLib`_, `Blazegraph`_, `TypeDB`_, `Titan`_, +and `many more <https://en.wikipedia.org/wiki/Graph_database#List_of_graph_databases>`_. +BSFS uses such a third-party graph database to store its file graph. + +As usual in database systems, +we have to distinguish schema data (that coverns the structure of the storage) +from instance data (the actual database content). +Similar to relational database systems, +both kinds of data can be represented as triples, +and subsequently stored within the same graph storage +(although one might need to separate them logically). +In BSFS, we employ an explicit schema (see next section) that is managed alongside the data. + + + +Schema +------ + +BSFS ensures consistency across multiple distributed client applications +by maintaining an explicit schema that governs node types and predicates. +Furthermore, exposing the schema allows client to run a number of compatibility and validity checks +locally, and a graph database may use the schema to optimize its storage or operations. + +In BSFS, the schema is initially provided by the system administrator +(usually in the `Turtle`_ format) +and subsequently stored by the backend. +The default schema defines three root types +(``bsfs:Node``, ``bsfs:Predicate``, and ``bsfs:Literal``), +and BSFS expects any node, literal, or predicate to be derived from these roots. + +For example, a new predicate can be defined like so:: + + # define some abbreviations + prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> + prefix bsfs: <http://schema.bsfs.io/> + prefix bse: <http://schema.bsfs.io/Entity#> + + # define a node type + bsfs:Entity rdfs:subClassOf bsfs:Node . + + # define a literal type + xsd:string rdfs:subClassOf bsfs:Literal . + + # define a predicate ("author of a node") + bse:author rdfs:subClassOf bsfs:Predicate ; + rdfs:domain bsfs:Entity ; + rdfs:range xsd:string . + +BSFS checks all requests and rejects queries or operations that violate the schema. + + +Querying +-------- + +BSFS at its core is not much more than a translator from a user query into a graph database query. +It operates directly on three abstract syntax trees (AST), +to run fetch, search, or sort, queries respectively. +By not using an existing query language, +we avoid an unnecessary and possibly expensive parsing step. +Some routines create an AST internally (e.g., :func:`bsfs.graph.graph.Graph.all`), +others accept an user-defined AST (e.g., :func:`bsfs.graph.graph.Graph.get`). +One way or another, the AST is validated against the schema, +and access control conditions are added. + + +.. _RDF: https://www.w3.org/RDF/ +.. _RDFLib: https://rdflib.readthedocs.io/en/stable/index.html +.. _Blazegraph: https://blazegraph.com/ +.. _Titan: http://titan.thinkaurelius.com/ +.. _TypeDB: https://vaticle.com/ +.. _Turtle: https://www.w3.org/TR/turtle/ + + diff --git a/doc/source/index.rst b/doc/source/index.rst new file mode 100644 index 0000000..91d53f6 --- /dev/null +++ b/doc/source/index.rst @@ -0,0 +1,75 @@ + +The Black Star File System +========================== + +A file system has two roles: It has to specify how to write files to a medium, and it has to define how a user can access files. +Most file systems focus on the first role and adopt the standard directory tree approach for the second role. +It is of course necessary to solve the challenges of medium access, but we should not neglect the user's perspective. +As a user, I mostly care about how how conveniently I can organize my data, and quickly I can access relevant information. +The hierarchical approach is rather restrictive in this regard: +You can only organize files in a directory tree [#f1]_, and search tasks often require third-party tools like `find`_ or `locate`_. + +Tagging file systems proposed an alternative file organization model. +Instead of a placing files in directories, they assign one or more (user-defined) tags to each file. +This increases the flexibility over a hierarchical data model, +because you can group any combination of files, and each file can be a part of various groups. +Semantic file systems push this idea one step further by trying to understand +the data they're dealing with. +For example, files can be grouped by their data type (documents), file format (odt), +author (yourself), topic (information management), etc. +The benefit for the user is that they can browse their files by association rather than by location --- similar to how we nagivate the Web. + +Clearly, the hierarchical approach is insufficient to organize this variety of information. +Instead, we need a network of files, +where they can be connected to each other, their properties, or to auxiliary nodes +(such as tags, collections, etc.) under a given relationship. +We call this the file graph. +With the *Black Star File System (BSFS)*, you can store, manage, and query such a file graph. + +.. + TODO: Clarify + * Different relationships + * Properties and auxiliary nodes + + TODO: File graph image + TODO: SFS/TFS references + + TODO: BSFS features + Within BSFS, you can store the file content, file metadata, + and content-derived information (e.g., features) alike. + + Within the file graph, we link files directly, + through properties, or through intermediate nodes. + +The Black Star File System is designed with three query patterns in mind: +navigation, search, and browsing. + +The **navigation** pattern describes the case when the user knows exactly what they want, +and they already have an address or id of the target file. +BSFS identifies each file with a unique URI, +or you can quickly navigate to a file via its name or other file properties. + +A **search** occurs when the user lacks the specific address or identifier to a target file, +but they have relatively clear and narrow search criteria. +With BSFS, you can search by file properties (name, size), content (keywords, features), +or associations to other files and auxiliary nodes (tags, collections). + +**Browsing** takes place when the user has only vague query criteria but wants to quickly scan and compare many files. +In BSFS, you can browse along file associations and rank results by a variety of similarity metrics. + +.. toctree:: + :maxdepth: 1 + + installation + concepts + architecture + api/modules + + +.. [#f1] although links and similar techniques allow some deviation from this principle + +.. _find: https://www.gnu.org/software/findutils/manual/html_node/find_html/Invoking-find.html#Invoking-find + +.. _locate: https://www.gnu.org/software/findutils/manual/html_node/find_html/Invoking-locate.html + + diff --git a/doc/source/installation.rst b/doc/source/installation.rst new file mode 100644 index 0000000..c7d8fba --- /dev/null +++ b/doc/source/installation.rst @@ -0,0 +1,46 @@ + +Installation +============ + +Installation +------------ + +Install *BSFS* via pip:: + + pip install --extra-index-url https://pip.bsfs.io bsfs + +This installs the `bsfs` python package as well as the `bsfs.app` command. +It is recommended to install *bsfs* in a virtual environment (via `virtualenv`). + + +License +------- + +This project is released under the terms of the 3-clause BSD License. +By downloading or using the application you agree to the license's terms and conditions. + +.. literalinclude:: ../../LICENSE + + +Source +------ + +Check out our git repository:: + + git clone https://git.bsfs.io/bsfs.git + +You can further install *bsfs* via the ususal `setuptools <https://setuptools.pypa.io/en/latest/index.html>`_ commands from your bsfs source directory:: + + python setup.py develop + +For development, you also need to install some additional dependencies:: + + # code style discipline + pip install mypy coverage pylint + + # documentation + pip install sphinx sphinx-copybutton furo + + # packaging + pip install build + |