Download data¶
This module provides functions to download Mojito L1 data files from the CSC’s Nextcloud server (brick market) and retrieve source parameters, handling authentication and caching.
Attention
Publications using Mojito data are currently not allowed! Please keep in touch, as publication policies will soon be published.
Note
Authentication using a personal Nextcloud access token is required to download Mojito data. Please refer to Authentication for more information on how to generate an access token and authenticate.
Downloading bricks¶
The main function to download data files is download_brick(), which takes
the type of brick (mbhb, gb, noise, combined, etc.) and an optional source
identifier as arguments.
The function constructs the appropriate download URL based on the brick type and source identifier, then downloads the file using authenticated access to the Nextcloud server. The files are cached locally to avoid redundant downloads, such that only a path to the cached file is returned on subsequent calls.
from mojito.download import download_brick
# Download mbhb brick for source ID 12
brick_path = download_brick("mbhb", source_id=12)
You can provide authentication credentials via keyword parameters username
and token. If not provided, the function will look for environment variables
MOJITO_USERNAME and MOJITO_TOKEN. If still not found, the user will be
prompted to enter them. Go to Authentication for more details on how to
generate an ccess token and authenticate.
# Download mbhb brick for source ID 12 with explicit credentials
brick_path = download_brick(
"mbhb", source_id=12, username="my_username", token="my_token"
)
By default, the latest version of the brick is downloaded. To download a
specific version, you can use the version parameter.
newest_brick_path = download_brick("mbhb", source_id=12, version="latest")
older_brick_path = download_brick("mbhb", source_id=12, version=2)
By default, the full official L1 version of the brick is downloaded. To download
a reduced version (with downsampled data and potentially missing quantities,
such as LTTs and orbits), you can use the reduced parameter.
full_brick_path = download_brick("mbhb", source_id=12, reduced=False)
reduced_brick_path = download_brick("mbhb", source_id=12, reduced=True)
Warning
Reduced versions are not official L1 files and are not representative of real data, as sampling rates are not correct and some quantities might be missing. Use them only for testing and development purposes.
- mojito.download.download_brick(brick, source_id=None, *, version='latest', reduced=False, cache_dir=None, username=None, token=None)[source]¶
Download a Mojito data file for the specified brick type and source ID.
- Parameters:
brick (
Literal['emri','mbhb','sobhb','vgb','gb'] |Literal['noise'] |Literal['combined']) – The type of data brick to download.source_id (
int|None, default:None) – The source identifier to download data for. Only applicable for source-specific bricks like “mbhb”, “emri”, or “gb”.version (
int|Literal['latest'], default:'latest') – The version of the brick to download. Can be an integer version number or “latest” to download the most recent version.reduced (
bool, default:False) – Whether to download the reduced version instead of the full, official brick. Reduced versions have downsampled data and may be missing some quantities. They are not official L1 files and should only be used for testing and development purposes.cache_dir (
str|None, default:None) – The cache directory path, or None to use the default XDG cache directory.username (
str|None, default:None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. Seeget_credentials().token (
str|None, default:None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user. Seeget_credentials().
- Return type:
str- Returns:
Path to the downloaded data file.
Source parameters¶
You can retrieve source parameters from the relevant catalog files using the
get_source_params() function. This function downloads the catalog if
needed and extracts the parameters for the specified source identifier.
from mojito.download import get_source_params
# Get source parameters for mbhb brick, source ID 12
params = get_source_params("mbhb", source_id=12)
You can also download the entire catalog file using the download_catalog()
function.
from mojito.download import download_catalog
# Download mbhb catalog
catalog_path = download_catalog("mbhb")
By default, the latest version of the catalog is downloaded. To download a
specific version, you can use the version parameter.
newest_catalog_path = download_catalog("mbhb", version="latest")
older_catalog_path = download_catalog("mbhb", version=2)
- mojito.download.download_catalog(brick, *, version='latest', cache_dir=None, username=None, token=None)[source]¶
Download a catalog for the specified brick type.
- Parameters:
brick (
Literal['emri','mbhb','sobhb','vgb','gb']) – The type of catalog to download.version (
int|Literal['latest'], default:'latest') – The version of the catalog to download. Can be an integer version number or “latest” to download the most recent version.cache_dir (
str|None, default:None) – The cache directory path, or None to use the default XDG cache directory.username (
str|None, default:None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. Seeget_credentials().token (
str|None, default:None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user. Seeget_credentials().
- Return type:
str- Returns:
Path to the downloaded catalog file.
- mojito.download.get_source_params(brick, source_id, *, version='latest', cache_dir=None, username=None, token=None)[source]¶
Get source parameters from relevant catalog.
If needed, this function downloads the source parameter catalog for the specified brick type; then retrieves parameters for the given source ID.
- Parameters:
brick (
Literal['emri','mbhb','sobhb','vgb','gb']) – The type of catalog to retrieve parameters from.source_id (
int) – The source identifier to retreive parameters for.version (
int|Literal['latest'], default:'latest') – The version of the catalog to download. Can be an integer version number or “latest” to download the most recent version.cache_dir (
str|None, default:None) – The cache directory path, or None to use the default XDG cache directory.username (
str|None, default:None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. Seeget_credentials().token (
str|None, default:None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user. Seeget_credentials().
- Return type:
dict[str,Any]- Returns:
Source parameters for the specified source ID.
Authentication¶
Mojito requires authentication to access the Nextcloud server hosting the data files. This is done using a username and an access token.
Generate access tokens¶
You first need to create an access token from the NextCloud Web Portal (see
NEXTCLOUD_BASE_URL). Go to your account settings, navigate to the
“Security” section, and create a new application password (aka access token).
You can use any application name; we recommend using “Mojito”.
Remember to store this token securely, as it will not be shown again. You can save it as an environment variable; see below.
Authentication methods¶
You can provide the username and token to Mojito in three ways:
Environment Variables: Set the environment variables
MOJITO_USERNAMEandMOJITO_TOKENwith your credentials. Mojito will automatically use these values when downloading data.Function Parameters: Pass the
usernameandtokenparameters directly to the download functions likedownload_brick(),download_catalog(), orget_source_params(). This method overrides any environment variable settings.User Prompt: If neither environment variables nor function parameters are provided, Mojito will prompt you to enter your username and token interactively when a download is attempted.
Tip
You can save your username and token in your shell’s configuration file
to avoid entering them each time. For example, add the following lines to
your .bashrc or .zshrc file:
# .bashrc or .zshrc
# ...
export MOJITO_USERNAME="your_username"
export MOJITO_TOKEN="your_token"
- mojito.download.get_credentials(username=None, token=None)[source]¶
Get Nextcloud credentials for authentication.
If
usernameandtokenare provided, use them directly. If one of them is not provided, first check the environment variablesMOJITO_USERNAMEandMOJITO_TOKENand return them if both are set. If still not set, prompt the user for input.- Parameters:
username (
str|None, default:None) – The Nextcloud username for authentication.token (
str|None, default:None) – The Nextcloud token for authentication.
- Return type:
tuple[str,str]- Returns:
- username
The Nextcloud username and token for authentication.
- token
The Nextcloud token for authentication.
Caching¶
By default, files are cached in the directory following the XDG Base Directory
Specification. This
can be overridden by setting the environment variable MOJITO_CACHE_DIR or by
passing a custom path when calling download_brick() or
download_catalog().
Warning
Do not move or rename the cached files, as this will prevent Mojito from locating them in future calls.
Note
We suggest to change the cache directory to a shared filesystem location if you are using Mojito on a computing cluster, to avoid downloading the same files multiple times for different compute nodes or users.
- mojito.download.get_cache_dir(cache_dir=None)[source]¶
Get the cache directory for Mojito data files.
If
cache_diris provided, it is used directly. Otherwise, first check the environment variableMOJITO_CACHE_DIR. If not set, use the default location fromplatformdirs.user_cache_dir()with “mojito” as the appname.- Parameters:
cache_dir (
str|None, default:None) – The cache directory path, or None to use the environment variable or default location.- Return type:
str- Returns:
The cache directory path.
Downloading arbitrary files¶
The download_file() function can be used to download any file from the
Nextcloud server, given its URL. This can be useful for downloading files that
are not listed in the registry, but be cautious when using this method, as it
can bypass security checks. Always ensure that you are downloading from trusted
sources when using this function.
- mojito.download.download_file(download_url, cache_dir=None, username=None, token=None, *, progressbar=True, unsafe=False)[source]¶
Download a Mojito data file from Nextcloud.
Warning
Using
unsafe=Truebypasses the registry check and checksum verification, which can lead to security risks if downloading files from untrusted sources. Use with caution and only for trusted URLs.- Parameters:
download_url (
str) – The URL of the data file to download.cache_dir (
str|None, default:None) – The cache directory path, or None to use the default XDG cache directory.username (
str|None, default:None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. Seeget_credentials().token (
str|None, default:None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user.. Seeget_credentials().progressbar (
bool, default:True) – Whether to show a progress bar during download.unsafe (
bool, default:False) – Whether to use pooch’sretrievemethod instead offetch. This can be used to bypass the registry check and download files not listed in it, avoid checksum verification, and calculate and print the file’s hash after download.
- Return type:
str- Returns:
Path to the downloaded data file.
Other download methods¶
In addition to the provided functions, a command-line interface is available. It allows you to download bricks, catalogs, or clear the cache directly from the terminal. Please consult Command-line interface for more information.
One can also download files manually using a web browser or command-line tools
like rclone. The base URL for accessing the globalstorage directory on
Nextcloud is given by NEXTCLOUD_BASE_URL.
- mojito.download.NEXTCLOUD_BASE_URL = 'https://nextcloud-dcc-fi-csc-okd-globalstorage1.2.rahtiapp.fi'¶
Base URL for globalstorage on Nextcloud (rahtiapp.fi).