Using the database
huntsman-drp
uses a mongodb database to store file metadata. mongodb
databases store nested metadata similar to python dictionaries. It is recommended to familiarise
yourself with the basics mongodb
before proceeding.
The relevant terminology is as follows:
Document: A single item in the database. This is in some sense equivalent to a “row” in a standard database table, but can contain nested / hierarchical data.
Collection: A collection is similar to a table in a standard database, but contains a set of documents rather than rows. Documents in a collection do not have to share the same data structure.
Collections
huntsman-drp
implements the Collection
class as a wrapper around mongodb
collections to ensure
standardisation of the database. Collection
instances are used to perform all database operations,
including inserting, deleting, modifying and searching for documents.
The following Collection
subclasses are currently implemented:
ExposureCollection
: This is used to store metadata for all exposures, with each document corresponding to a single exposure.
CalibCollection
: This stores metadata for master calibration files. Each document corresponds to a single calibration file.
Collection instances should be created like this:
from huntsman.drp.collection import ExposureCollection
collection = ExposureCollection.from_config()
Using from_config
ensures that the class instances will be correctly initialised from the config file.
Once a Collection
instance is created, it automatically connects to the mongodb
client and is ready for use.
Querying for files
File queries are performed using document filters, which are simple python dictionaries. For example
to find all one-second exposures taken using the g_band
filter, one can do:
document_filter = {"physical_filter": "g_band", "exposure_time": 1}
docs = collection.find(document_filter)
Dot notation can be used to query for nested items, e.g.:
document_filter = {"metrics.has_wcs": True}
For more advanced queries, we can use mongodb query operators, which are specified as part of the document filter. For example, to query for files with exposure times greater than one second, we can do:
document_filter = {"exposure_time": {"$gt": 1}}
There are also various key word arguments that can be used with collection.find
. For example, to
query in a date range, we can do:
docs = collection.find(date_min="2021-01-01", date_max="2021-02-01")
Please see the Collection
API for more details.
The return value from Collection.find
is a list of Document
objects. These behave very similarly
to python dictionaries, but they are hashable (can be contained in sets) and facilitate get
calls with
“dot” notation for nested items.
Getting calibration files
Calibration files can be obtained using a CalibCollection
instance, e.g.:
from huntsman.drp.collection import CalibCollection
collection = CalibCollection.from_config()
calib_docs = collection.find({"datasetType": "flat"})
Here, datasetType
specifies the type of calib, e.g. bias, defects, dark, flat. Alternatively,
a complete set of calibration documents that match a document in the ExposureCollection
can be
obtained like this:
calib_docs = collection.get_matching_calibs(exposure_document)
Master calibration files are stored in the archive directory. If you want to download the files to your local machine, you will need to download them from the archive directory mounted on the host system.