diff options
author | Matthias Baumgartner <dev@igsor.net> | 2023-03-01 22:05:06 +0100 |
---|---|---|
committer | Matthias Baumgartner <dev@igsor.net> | 2023-03-01 22:05:06 +0100 |
commit | e0c4713c40367b4b41da926da0ba7ed05d47d54b (patch) | |
tree | 6f224f0af22afa356be39a8852c513ed488eec17 /doc | |
parent | 365b36a30eb0afb706b706e0fa32b414f9d51a90 (diff) | |
download | bsie-e0c4713c40367b4b41da926da0ba7ed05d47d54b.tar.gz bsie-e0c4713c40367b4b41da926da0ba7ed05d47d54b.tar.bz2 bsie-e0c4713c40367b4b41da926da0ba7ed05d47d54b.zip |
documentation
Diffstat (limited to 'doc')
-rw-r--r-- | doc/Makefile | 20 | ||||
-rw-r--r-- | doc/make.bat | 35 | ||||
-rw-r--r-- | doc/source/architecture.rst | 71 | ||||
-rw-r--r-- | doc/source/conf.py | 37 | ||||
-rw-r--r-- | doc/source/index.rst | 26 | ||||
-rw-r--r-- | doc/source/installation.rst | 49 |
6 files changed, 238 insertions, 0 deletions
diff --git a/doc/Makefile b/doc/Makefile new file mode 100644 index 0000000..d0c3cbf --- /dev/null +++ b/doc/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/doc/make.bat b/doc/make.bat new file mode 100644 index 0000000..747ffb7 --- /dev/null +++ b/doc/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+ set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+ echo.
+ echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+ echo.installed, then set the SPHINXBUILD environment variable to point
+ echo.to the full path of the 'sphinx-build' executable. Alternatively you
+ echo.may add the Sphinx directory to PATH.
+ echo.
+ echo.If you don't have Sphinx installed, grab it from
+ echo.https://www.sphinx-doc.org/
+ exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/doc/source/architecture.rst b/doc/source/architecture.rst new file mode 100644 index 0000000..750319e --- /dev/null +++ b/doc/source/architecture.rst @@ -0,0 +1,71 @@ + +Architecture +============ + + +The information extraction pipeline traverses through three stages of abstraction: + +1. File format +2. Content +3. Predicate-value pairs + +For example, an image can be stored in various file formats (JPEG, TIFF, PNG). +In turn, a file format can store different kinds of information such as the image data (pixels) and additional metadata (image dimensions, EXIF tags). +Finally, we translate the information read from the file into predicate-value pairs that can be attached to a file node in BSFS, e.g., ``(bse:filesize, 8150000)``, ``(bse:width, 6000)``, ``(bse:height, 4000)``, ``(bse:iso, 100)``, etc. + +The extraction pipeline is thus divided into +:mod:`Readers <bsie.reader>` that abstract from file formats and content types, +and :mod:`Extractors <bsie.extractor>` which produce predicate-value pairs from content artifacts. + + +Readers +------- + +:mod:`Readers <bsie.reader>` read the actual file (considering different file formats) +and isolate specific content artifacts therein. +The content artifact (in an internal representation) +is then passed to an Extractor for further processing. + +For example, the :class:`Image <bsie.reader.image.Image>` reader aims at reading the content (pixels) of an image file. +It automatically detects which python package (e.g., `rawpy`_, `pillow`_) +to use when faced with the various existing image file formats. +The image data is then converted into a PIL.Image instance +(irrespective of which package was used to read the data), +and passed on to the extractor. + + +Extractors +---------- + +:mod:`Extractors <bsie.extractor>` turn content artifacts into +predicate-value pairs that can be inserted into a BSFS storage. +The predicate is defined by each extractor, as prescribed by BSFS' schema handling. + +For example, the class :class:`ColorsSpatial <bsie.extractor.image.colors_spatial.ColorsSpatial` +determines regionally dominant colors from given pixel data. +It then produces a feature vector and attaches it to the image file via the appropriate predicate. + + +BSIE lib and apps +----------------- + +The advantage of separating the reading and extraction steps is that multiple extractors +can consume the same content, avoiding multiple re-reads of the same data. +This close interaction between readers and extractors is encapsulated +within the :class:`Pipeline <bsie.lib.pipeline.Pipeline>` class. + +Also, that having to deal with various file formats and content artifacts +potentially pulls in a large number of dependencies. +To make matters worse, many of those might not be needed in a specific scenario, +e.g., if a user only works with a limited set of file formats. +BSIE therefore implements a best-effort approach, +that is modules that cannot be imported due to missing dependencies are ignored. + +With these two concerns taken care of, +BSIE offers a few :mod:`end-user applications <bsie.apps>` +that reduce the complexity of the task to a relatively simple command. + + + +.. _pillow: https://python-pillow.org/ +.. _rawpy: https://github.com/letmaik/rawpy diff --git a/doc/source/conf.py b/doc/source/conf.py new file mode 100644 index 0000000..017e036 --- /dev/null +++ b/doc/source/conf.py @@ -0,0 +1,37 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information + +project = 'Black Star Information Extraction' +copyright = '2023, Matthias Baumgartner' +author = 'Matthias Baumgartner' +release = '0.5' + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = [ + 'sphinx_copybutton', + 'sphinx.ext.autodoc', + ] + +templates_path = ['_templates'] +exclude_patterns = [] + + + +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output + +html_theme = 'furo' +html_static_path = ['_static'] + +html_title = 'bsie' +html_theme_options = { + 'announcement': '<em>This project is under heavy development and subject to rapid changes. Use at your own discretion.</em>', + } + diff --git a/doc/source/index.rst b/doc/source/index.rst new file mode 100644 index 0000000..9cf06fe --- /dev/null +++ b/doc/source/index.rst @@ -0,0 +1,26 @@ + +Black Star Information Extraction +================================= + +A major advantage of the `Black Star File System (BSFS) <https://www.bsfs.io/bsfs/>`_ +is its ability to store various kinds of (meta)data associated with a file. +However, the BSFS itself is only a storage solution, +it does not inspect files or collect information about them. + +The Black Star Information Extraction (BSIE) package fills this gap by +extracting various kinds of information from a file and pushing that data to a BSFS instance. + +BSIE has the ability to process numerous file formats, +and it can turn various aspects of a file into usable information. +This includes metadata from a source file system, +metadata stored within the file, +and even excerpts or feature representations of the file's content itself. + +.. toctree:: + :maxdepth: 1 + + installation + architecture + api/modules + + diff --git a/doc/source/installation.rst b/doc/source/installation.rst new file mode 100644 index 0000000..42b1e4e --- /dev/null +++ b/doc/source/installation.rst @@ -0,0 +1,49 @@ + +Installation +============ + +Installation +------------ + +Install *bsie* via pip:: + + pip install --extra-index-url https://pip.bsfs.io bsie + +This installs the `bsie` python package as well as the `bsie.app` command. +It is recommended to install *bsie* in a virtual environment (via `virtualenv`). + + +License +------- + +This project is released under the terms of the 3-clause BSD License. +By downloading or using the application you agree to the license's terms and conditions. + +.. literalinclude:: ../../LICENSE + + +Source +------ + +Check out our git repository:: + + git clone https://git.bsfs.io/bsie.git + +You can further install *bsie* via the ususal `setuptools <https://setuptools.pypa.io/en/latest/index.html>`_ commands from your bsie source directory:: + + python setup.py develop + +For development, you also need to install some additional dependencies:: + + # extra packages for tests + pip install rdflib requests + + # code style discipline + pip install mypy coverage pylint + + # documentation + pip install sphinx sphinx-copybutton furo + + # packaging + pip install build + |