From e0c4713c40367b4b41da926da0ba7ed05d47d54b Mon Sep 17 00:00:00 2001 From: Matthias Baumgartner Date: Wed, 1 Mar 2023 22:05:06 +0100 Subject: documentation --- doc/source/architecture.rst | 71 +++++++++++++++++++++++++++++++++++++++++++++ doc/source/conf.py | 37 +++++++++++++++++++++++ doc/source/index.rst | 26 +++++++++++++++++ doc/source/installation.rst | 49 +++++++++++++++++++++++++++++++ 4 files changed, 183 insertions(+) create mode 100644 doc/source/architecture.rst create mode 100644 doc/source/conf.py create mode 100644 doc/source/index.rst create mode 100644 doc/source/installation.rst (limited to 'doc/source') diff --git a/doc/source/architecture.rst b/doc/source/architecture.rst new file mode 100644 index 0000000..750319e --- /dev/null +++ b/doc/source/architecture.rst @@ -0,0 +1,71 @@ + +Architecture +============ + + +The information extraction pipeline traverses through three stages of abstraction: + +1. File format +2. Content +3. Predicate-value pairs + +For example, an image can be stored in various file formats (JPEG, TIFF, PNG). +In turn, a file format can store different kinds of information such as the image data (pixels) and additional metadata (image dimensions, EXIF tags). +Finally, we translate the information read from the file into predicate-value pairs that can be attached to a file node in BSFS, e.g., ``(bse:filesize, 8150000)``, ``(bse:width, 6000)``, ``(bse:height, 4000)``, ``(bse:iso, 100)``, etc. + +The extraction pipeline is thus divided into +:mod:`Readers ` that abstract from file formats and content types, +and :mod:`Extractors ` which produce predicate-value pairs from content artifacts. + + +Readers +------- + +:mod:`Readers ` read the actual file (considering different file formats) +and isolate specific content artifacts therein. +The content artifact (in an internal representation) +is then passed to an Extractor for further processing. + +For example, the :class:`Image ` reader aims at reading the content (pixels) of an image file. +It automatically detects which python package (e.g., `rawpy`_, `pillow`_) +to use when faced with the various existing image file formats. +The image data is then converted into a PIL.Image instance +(irrespective of which package was used to read the data), +and passed on to the extractor. + + +Extractors +---------- + +:mod:`Extractors ` turn content artifacts into +predicate-value pairs that can be inserted into a BSFS storage. +The predicate is defined by each extractor, as prescribed by BSFS' schema handling. + +For example, the class :class:`ColorsSpatial ` class. + +Also, that having to deal with various file formats and content artifacts +potentially pulls in a large number of dependencies. +To make matters worse, many of those might not be needed in a specific scenario, +e.g., if a user only works with a limited set of file formats. +BSIE therefore implements a best-effort approach, +that is modules that cannot be imported due to missing dependencies are ignored. + +With these two concerns taken care of, +BSIE offers a few :mod:`end-user applications ` +that reduce the complexity of the task to a relatively simple command. + + + +.. _pillow: https://python-pillow.org/ +.. _rawpy: https://github.com/letmaik/rawpy diff --git a/doc/source/conf.py b/doc/source/conf.py new file mode 100644 index 0000000..017e036 --- /dev/null +++ b/doc/source/conf.py @@ -0,0 +1,37 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information + +project = 'Black Star Information Extraction' +copyright = '2023, Matthias Baumgartner' +author = 'Matthias Baumgartner' +release = '0.5' + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = [ + 'sphinx_copybutton', + 'sphinx.ext.autodoc', + ] + +templates_path = ['_templates'] +exclude_patterns = [] + + + +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output + +html_theme = 'furo' +html_static_path = ['_static'] + +html_title = 'bsie' +html_theme_options = { + 'announcement': 'This project is under heavy development and subject to rapid changes. Use at your own discretion.', + } + diff --git a/doc/source/index.rst b/doc/source/index.rst new file mode 100644 index 0000000..9cf06fe --- /dev/null +++ b/doc/source/index.rst @@ -0,0 +1,26 @@ + +Black Star Information Extraction +================================= + +A major advantage of the `Black Star File System (BSFS) `_ +is its ability to store various kinds of (meta)data associated with a file. +However, the BSFS itself is only a storage solution, +it does not inspect files or collect information about them. + +The Black Star Information Extraction (BSIE) package fills this gap by +extracting various kinds of information from a file and pushing that data to a BSFS instance. + +BSIE has the ability to process numerous file formats, +and it can turn various aspects of a file into usable information. +This includes metadata from a source file system, +metadata stored within the file, +and even excerpts or feature representations of the file's content itself. + +.. toctree:: + :maxdepth: 1 + + installation + architecture + api/modules + + diff --git a/doc/source/installation.rst b/doc/source/installation.rst new file mode 100644 index 0000000..42b1e4e --- /dev/null +++ b/doc/source/installation.rst @@ -0,0 +1,49 @@ + +Installation +============ + +Installation +------------ + +Install *bsie* via pip:: + + pip install --extra-index-url https://pip.bsfs.io bsie + +This installs the `bsie` python package as well as the `bsie.app` command. +It is recommended to install *bsie* in a virtual environment (via `virtualenv`). + + +License +------- + +This project is released under the terms of the 3-clause BSD License. +By downloading or using the application you agree to the license's terms and conditions. + +.. literalinclude:: ../../LICENSE + + +Source +------ + +Check out our git repository:: + + git clone https://git.bsfs.io/bsie.git + +You can further install *bsie* via the ususal `setuptools `_ commands from your bsie source directory:: + + python setup.py develop + +For development, you also need to install some additional dependencies:: + + # extra packages for tests + pip install rdflib requests + + # code style discipline + pip install mypy coverage pylint + + # documentation + pip install sphinx sphinx-copybutton furo + + # packaging + pip install build + -- cgit v1.2.3 From ba6329bbe14c832d42773dee2fe30bd7669ca255 Mon Sep 17 00:00:00 2001 From: Matthias Baumgartner Date: Thu, 2 Mar 2023 08:58:29 +0100 Subject: various minor fixes --- doc/source/installation.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'doc/source') diff --git a/doc/source/installation.rst b/doc/source/installation.rst index 42b1e4e..b634457 100644 --- a/doc/source/installation.rst +++ b/doc/source/installation.rst @@ -40,6 +40,8 @@ For development, you also need to install some additional dependencies:: # code style discipline pip install mypy coverage pylint + # external type annotations for pyyaml + pip install types-PyYAML # documentation pip install sphinx sphinx-copybutton furo -- cgit v1.2.3 From 8b460aa0232cd841af7b7734c91982bc83486e03 Mon Sep 17 00:00:00 2001 From: Matthias Baumgartner Date: Sun, 5 Mar 2023 19:14:11 +0100 Subject: build fixes --- doc/source/installation.rst | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) (limited to 'doc/source') diff --git a/doc/source/installation.rst b/doc/source/installation.rst index b634457..ee6fadb 100644 --- a/doc/source/installation.rst +++ b/doc/source/installation.rst @@ -2,15 +2,39 @@ Installation ============ -Installation ------------- +You can install *bsie* via pip. BSIE comes with support for various file formats. +For this, it needs to install many external packages. BSIE lets you control +which of these you want to install. Note that if you choose to not install +support for some file types, BSIE will show a warning and skip them. +All other formats will be processed normally. +It is recommended to install *bsie* in a virtual environment (via ``virtualenv``). -Install *bsie* via pip:: +To install only the minimally required software, use:: pip install --extra-index-url https://pip.bsfs.io bsie -This installs the `bsie` python package as well as the `bsie.app` command. -It is recommended to install *bsie* in a virtual environment (via `virtualenv`). +To install all dependencies, use the following shortcut:: + + pip install --extra-index-url https://pip.bsfs.io bsie[all] + +To install a subset of all dependencies, modify the extras part (``[image, preview]``) +of the follwing command to your liking:: + + pip install --extra-index-url https://pip.bsfs.io bsie[image,preview] + +Currently, BSIE providesthe following extra flags: + +* image: Read data from image files. + Note that you may also have to install ``exiftool`` through your system's + package manager (e.g. ``sudo apt install exiftool``). +* preview: Create previews from a variety of files. + Note that support for various file formats also depends on what + system packages you've installed. You should at least install ``imagemagick`` + through your system's package manager (e.g. ``sudo apt install imagemagick``). + See `Preview Generator `_ for + more detailed instructions. +* features: Extract feature vectors from images. + License -- cgit v1.2.3