aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMatthias Baumgartner <dev@igsor.net>2023-03-01 22:05:06 +0100
committerMatthias Baumgartner <dev@igsor.net>2023-03-01 22:05:06 +0100
commite0c4713c40367b4b41da926da0ba7ed05d47d54b (patch)
tree6f224f0af22afa356be39a8852c513ed488eec17
parent365b36a30eb0afb706b706e0fa32b414f9d51a90 (diff)
downloadbsie-e0c4713c40367b4b41da926da0ba7ed05d47d54b.tar.gz
bsie-e0c4713c40367b4b41da926da0ba7ed05d47d54b.tar.bz2
bsie-e0c4713c40367b4b41da926da0ba7ed05d47d54b.zip
documentation
-rw-r--r--.gitignore1
-rw-r--r--doc/Makefile20
-rw-r--r--doc/make.bat35
-rw-r--r--doc/source/architecture.rst71
-rw-r--r--doc/source/conf.py37
-rw-r--r--doc/source/index.rst26
-rw-r--r--doc/source/installation.rst49
7 files changed, 239 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
index c046d71..d2785ad 100644
--- a/.gitignore
+++ b/.gitignore
@@ -21,6 +21,7 @@ build/
# doc builds
doc/build/
+doc/source/api
# testing data
test/reader/image/testimage.nef*
diff --git a/doc/Makefile b/doc/Makefile
new file mode 100644
index 0000000..d0c3cbf
--- /dev/null
+++ b/doc/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS ?=
+SPHINXBUILD ?= sphinx-build
+SOURCEDIR = source
+BUILDDIR = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+ @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+ @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/doc/make.bat b/doc/make.bat
new file mode 100644
index 0000000..747ffb7
--- /dev/null
+++ b/doc/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+ set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+ echo.
+ echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+ echo.installed, then set the SPHINXBUILD environment variable to point
+ echo.to the full path of the 'sphinx-build' executable. Alternatively you
+ echo.may add the Sphinx directory to PATH.
+ echo.
+ echo.If you don't have Sphinx installed, grab it from
+ echo.https://www.sphinx-doc.org/
+ exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/doc/source/architecture.rst b/doc/source/architecture.rst
new file mode 100644
index 0000000..750319e
--- /dev/null
+++ b/doc/source/architecture.rst
@@ -0,0 +1,71 @@
+
+Architecture
+============
+
+
+The information extraction pipeline traverses through three stages of abstraction:
+
+1. File format
+2. Content
+3. Predicate-value pairs
+
+For example, an image can be stored in various file formats (JPEG, TIFF, PNG).
+In turn, a file format can store different kinds of information such as the image data (pixels) and additional metadata (image dimensions, EXIF tags).
+Finally, we translate the information read from the file into predicate-value pairs that can be attached to a file node in BSFS, e.g., ``(bse:filesize, 8150000)``, ``(bse:width, 6000)``, ``(bse:height, 4000)``, ``(bse:iso, 100)``, etc.
+
+The extraction pipeline is thus divided into
+:mod:`Readers <bsie.reader>` that abstract from file formats and content types,
+and :mod:`Extractors <bsie.extractor>` which produce predicate-value pairs from content artifacts.
+
+
+Readers
+-------
+
+:mod:`Readers <bsie.reader>` read the actual file (considering different file formats)
+and isolate specific content artifacts therein.
+The content artifact (in an internal representation)
+is then passed to an Extractor for further processing.
+
+For example, the :class:`Image <bsie.reader.image.Image>` reader aims at reading the content (pixels) of an image file.
+It automatically detects which python package (e.g., `rawpy`_, `pillow`_)
+to use when faced with the various existing image file formats.
+The image data is then converted into a PIL.Image instance
+(irrespective of which package was used to read the data),
+and passed on to the extractor.
+
+
+Extractors
+----------
+
+:mod:`Extractors <bsie.extractor>` turn content artifacts into
+predicate-value pairs that can be inserted into a BSFS storage.
+The predicate is defined by each extractor, as prescribed by BSFS' schema handling.
+
+For example, the class :class:`ColorsSpatial <bsie.extractor.image.colors_spatial.ColorsSpatial`
+determines regionally dominant colors from given pixel data.
+It then produces a feature vector and attaches it to the image file via the appropriate predicate.
+
+
+BSIE lib and apps
+-----------------
+
+The advantage of separating the reading and extraction steps is that multiple extractors
+can consume the same content, avoiding multiple re-reads of the same data.
+This close interaction between readers and extractors is encapsulated
+within the :class:`Pipeline <bsie.lib.pipeline.Pipeline>` class.
+
+Also, that having to deal with various file formats and content artifacts
+potentially pulls in a large number of dependencies.
+To make matters worse, many of those might not be needed in a specific scenario,
+e.g., if a user only works with a limited set of file formats.
+BSIE therefore implements a best-effort approach,
+that is modules that cannot be imported due to missing dependencies are ignored.
+
+With these two concerns taken care of,
+BSIE offers a few :mod:`end-user applications <bsie.apps>`
+that reduce the complexity of the task to a relatively simple command.
+
+
+
+.. _pillow: https://python-pillow.org/
+.. _rawpy: https://github.com/letmaik/rawpy
diff --git a/doc/source/conf.py b/doc/source/conf.py
new file mode 100644
index 0000000..017e036
--- /dev/null
+++ b/doc/source/conf.py
@@ -0,0 +1,37 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+project = 'Black Star Information Extraction'
+copyright = '2023, Matthias Baumgartner'
+author = 'Matthias Baumgartner'
+release = '0.5'
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+ 'sphinx_copybutton',
+ 'sphinx.ext.autodoc',
+ ]
+
+templates_path = ['_templates']
+exclude_patterns = []
+
+
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = 'furo'
+html_static_path = ['_static']
+
+html_title = 'bsie'
+html_theme_options = {
+ 'announcement': '<em>This project is under heavy development and subject to rapid changes. Use at your own discretion.</em>',
+ }
+
diff --git a/doc/source/index.rst b/doc/source/index.rst
new file mode 100644
index 0000000..9cf06fe
--- /dev/null
+++ b/doc/source/index.rst
@@ -0,0 +1,26 @@
+
+Black Star Information Extraction
+=================================
+
+A major advantage of the `Black Star File System (BSFS) <https://www.bsfs.io/bsfs/>`_
+is its ability to store various kinds of (meta)data associated with a file.
+However, the BSFS itself is only a storage solution,
+it does not inspect files or collect information about them.
+
+The Black Star Information Extraction (BSIE) package fills this gap by
+extracting various kinds of information from a file and pushing that data to a BSFS instance.
+
+BSIE has the ability to process numerous file formats,
+and it can turn various aspects of a file into usable information.
+This includes metadata from a source file system,
+metadata stored within the file,
+and even excerpts or feature representations of the file's content itself.
+
+.. toctree::
+ :maxdepth: 1
+
+ installation
+ architecture
+ api/modules
+
+
diff --git a/doc/source/installation.rst b/doc/source/installation.rst
new file mode 100644
index 0000000..42b1e4e
--- /dev/null
+++ b/doc/source/installation.rst
@@ -0,0 +1,49 @@
+
+Installation
+============
+
+Installation
+------------
+
+Install *bsie* via pip::
+
+ pip install --extra-index-url https://pip.bsfs.io bsie
+
+This installs the `bsie` python package as well as the `bsie.app` command.
+It is recommended to install *bsie* in a virtual environment (via `virtualenv`).
+
+
+License
+-------
+
+This project is released under the terms of the 3-clause BSD License.
+By downloading or using the application you agree to the license's terms and conditions.
+
+.. literalinclude:: ../../LICENSE
+
+
+Source
+------
+
+Check out our git repository::
+
+ git clone https://git.bsfs.io/bsie.git
+
+You can further install *bsie* via the ususal `setuptools <https://setuptools.pypa.io/en/latest/index.html>`_ commands from your bsie source directory::
+
+ python setup.py develop
+
+For development, you also need to install some additional dependencies::
+
+ # extra packages for tests
+ pip install rdflib requests
+
+ # code style discipline
+ pip install mypy coverage pylint
+
+ # documentation
+ pip install sphinx sphinx-copybutton furo
+
+ # packaging
+ pip install build
+