Black Star Information Extraction
The Black Star Information Extraction (BSIE) package provides a pipeline to extract metadata and content-derived features from files and stores that information in a BSFS storage.
Installation
You can install BSIE via pip. BSIE comes with support for various file formats. For this, it needs to install many external packages. BSIE lets you control which of these you want to install. Note that if you choose to not install support for some file types, BSIE will show a warning and skip them. All other formats will be processed normally.
To install only the minimally required software, use:
$ pip install --extra-index-url https://pip.bsfs.io bsie
To install all dependencies, use the following shortcut:
$ pip install --extra-index-url https://pip.bsfs.io bsie[all]
To install a subset of all dependencies, modify the extras part ([image, preview]
)
of the follwing command to your liking:
$ pip install --extra-index-url https://pip.bsfs.io bsie[image,preview]
Currently, BSIE providesthe following extra flags:
- image: Read data from image files.
Note that you may also have to install
exiftool
through your system's package manager (e.g.sudo apt install exiftool
). - preview: Create previews from a variety of files.
Note that support for various file formats also depends on what
system packages you've installed. You should at least install
imagemagick
through your system's package manager (e.g.sudo apt install imagemagick
). See Preview Generator for more detailed instructions. - features: Extract feature vectors from images.
Development
Set up a virtual environment:
$ virtualenv env
$ source env/bin/activate
Install bsie as editable from the git repository:
$ git clone https://git.bsfs.io/bsie.git
$ cd bsie
$ pip install -e .[all]
If you want to develop (dev), run the tests (test), edit the documentation (doc), or build a distributable (build), install bsfs with the respective extras (in addition to file format extras):
$ pip install -e .[dev,doc,build,test]
Or, you can manually install the following packages besides BSIE:
$ pip install coverage mypy pylint
$ pip install rdflib requests types-PyYAML
$ pip install sphinx sphinx-copybutton furo
$ pip install build
To ensure code style discipline, run the following commands:
$ coverage run ; coverage html ; xdg-open .htmlcov/index.html
$ pylint bsie
$ mypy
To build the package, do:
$ python -m build
To run only the tests (without coverage), run the following command from the test folder:
$ python -m unittest
To build the documentation, run the following commands from the doc folder:
$ sphinx-apidoc -f -o source/api ../bsie/ --module-first -d 1 --separate
$ make html
$ xdg-open build/html/index.html