==================
Using the database
==================
|project| uses a `mongodb `_ database to store file metadata. ``mongodb``
databases store nested metadata similar to python dictionaries. It is recommended to familiarise
yourself with the basics ``mongodb`` before proceeding.
The relevant terminology is as follows:
**Document**: A single item in the database. This is in some sense equivalent to a "row" in a standard database table, but can contain nested / hierarchical data.
**Collection**: A collection is similar to a table in a standard database, but contains a set of documents rather than rows. Documents in a collection do not have to share the same data structure.
Collections
===========
|project| implements the ``Collection`` class as a wrapper around ``mongodb`` collections to ensure
standardisation of the database. ``Collection`` instances are used to perform all database operations,
including inserting, deleting, modifying and searching for documents.
The following ``Collection`` subclasses are currently implemented:
``ExposureCollection``: This is used to store metadata for all exposures, with each document corresponding to a single exposure.
``CalibCollection``: This stores metadata for master calibration files. Each document corresponds to a single calibration file.
Collection instances should be created like this:
.. code-block:: python
from huntsman.drp.collection import ExposureCollection
collection = ExposureCollection.from_config()
Using ``from_config`` ensures that the class instances will be correctly initialised from the config file.
Once a ``Collection`` instance is created, it automatically connects to the ``mongodb`` client and is ready for use.
Querying for files
~~~~~~~~~~~~~~~~~~
File queries are performed using **document filters**, which are simple python dictionaries. For example
to find all one-second exposures taken using the ``g_band`` filter, one can do:
.. code-block:: python
document_filter = {"physical_filter": "g_band", "exposure_time": 1}
docs = collection.find(document_filter)
Dot notation can be used to query for nested items, e.g.:
.. code-block:: python
document_filter = {"metrics.has_wcs": True}
For more advanced queries, we can use `mongodb query operators `_, which are specified as part of the
document filter. For example, to query for files with exposure times greater than one second, we can do:
.. code-block:: python
document_filter = {"exposure_time": {"$gt": 1}}
There are also various key word arguments that can be used with ``collection.find``. For example, to
query in a date range, we can do:
.. code-block:: python
docs = collection.find(date_min="2021-01-01", date_max="2021-02-01")
Please see the ``Collection`` API for more details.
The return value from ``Collection.find`` is a list of ``Document`` objects. These behave very similarly
to python dictionaries, but they are hashable (can be contained in sets) and facilitate ``get`` calls with
"dot" notation for nested items.
Getting calibration files
~~~~~~~~~~~~~~~~~~~~~~~~~
Calibration files can be obtained using a ``CalibCollection`` instance, e.g.:
.. code-block:: python
from huntsman.drp.collection import CalibCollection
collection = CalibCollection.from_config()
calib_docs = collection.find({"datasetType": "flat"})
Here, ``datasetType`` specifies the type of calib, e.g. bias, defects, dark, flat. Alternatively,
a complete set of calibration documents that match a document in the ``ExposureCollection`` can be
obtained like this:
.. code-block:: python
calib_docs = collection.get_matching_calibs(exposure_document)
Master calibration files are stored in :ref:`the archive directory`.
If you want to download the files to your local machine, you will need to download them from the
archive directory mounted on the host system.
Documents
=========