Using the database

huntsman-drp uses a mongodb database to store file metadata. mongodb databases store nested metadata similar to python dictionaries. It is recommended to familiarise yourself with the basics mongodb before proceeding.

The relevant terminology is as follows:

Document: A single item in the database. This is in some sense equivalent to a “row” in a standard database table, but can contain nested / hierarchical data.

Collection: A collection is similar to a table in a standard database, but contains a set of documents rather than rows. Documents in a collection do not have to share the same data structure.

Collections

huntsman-drp implements the Collection class as a wrapper around mongodb collections to ensure standardisation of the database. Collection instances are used to perform all database operations, including inserting, deleting, modifying and searching for documents.

The following Collection subclasses are currently implemented:

ExposureCollection: This is used to store metadata for all exposures, with each document corresponding to a single exposure.

CalibCollection: This stores metadata for master calibration files. Each document corresponds to a single calibration file.

Collection instances should be created like this:

from huntsman.drp.collection import ExposureCollection
collection = ExposureCollection.from_config()

Using from_config ensures that the class instances will be correctly initialised from the config file. Once a Collection instance is created, it automatically connects to the mongodb client and is ready for use.

Querying for files

File queries are performed using document filters, which are simple python dictionaries. For example to find all one-second exposures taken using the g_band filter, one can do:

document_filter = {"physical_filter": "g_band", "exposure_time": 1}
docs = collection.find(document_filter)

Dot notation can be used to query for nested items, e.g.:

document_filter = {"metrics.has_wcs": True}

For more advanced queries, we can use mongodb query operators, which are specified as part of the document filter. For example, to query for files with exposure times greater than one second, we can do:

document_filter = {"exposure_time": {"$gt": 1}}

There are also various key word arguments that can be used with collection.find. For example, to query in a date range, we can do:

docs = collection.find(date_min="2021-01-01", date_max="2021-02-01")

Please see the Collection API for more details.

The return value from Collection.find is a list of Document objects. These behave very similarly to python dictionaries, but they are hashable (can be contained in sets) and facilitate get calls with “dot” notation for nested items.

Getting calibration files

Calibration files can be obtained using a CalibCollection instance, e.g.:

from huntsman.drp.collection import CalibCollection
collection = CalibCollection.from_config()
calib_docs = collection.find({"datasetType": "flat"})

Here, datasetType specifies the type of calib, e.g. bias, defects, dark, flat. Alternatively, a complete set of calibration documents that match a document in the ExposureCollection can be obtained like this:

calib_docs = collection.get_matching_calibs(exposure_document)

Master calibration files are stored in the archive directory. If you want to download the files to your local machine, you will need to download them from the archive directory mounted on the host system.

Using the database

Collections

Querying for files

Getting calibration files

Documents