Download data

This module provides functions to download Mojito L1 data files from the CSC’s Nextcloud server (brick market) and retrieve source parameters, handling authentication and caching.

Attention

Publications using Mojito data are currently not allowed! Please keep in touch, as publication policies will soon be published.

Note

Authentication using a personal Nextcloud access token is required to download Mojito data. Please refer to Authentication for more information on how to generate an access token and authenticate.

Downloading bricks

The main function to download data files is download_brick(), which takes the type of brick (mbhb, gb, noise, combined, etc.) and an optional source identifier as arguments.

The function constructs the appropriate download URL based on the brick type and source identifier, then downloads the file using authenticated access to the Nextcloud server. The files are cached locally to avoid redundant downloads, such that only a path to the cached file is returned on subsequent calls.

from mojito.download import download_brick

# Download mbhb brick for source ID 12
brick_path = download_brick("mbhb", source_id=12)

You can provide authentication credentials via keyword parameters username and token. If not provided, the function will look for environment variables MOJITO_USERNAME and MOJITO_TOKEN. If still not found, the user will be prompted to enter them. Go to Authentication for more details on how to generate an ccess token and authenticate.

# Download mbhb brick for source ID 12 with explicit credentials
brick_path = download_brick(
    "mbhb", source_id=12, username="my_username", token="my_token"
)

By default, the latest version of the brick is downloaded. To download a specific version, you can use the version parameter.

newest_brick_path = download_brick("mbhb", source_id=12, version="latest")
older_brick_path = download_brick("mbhb", source_id=12, version=2)

By default, the full official L1 version of the brick is downloaded. To download a reduced version (with downsampled data and potentially missing quantities, such as LTTs and orbits), you can use the reduced parameter.

full_brick_path = download_brick("mbhb", source_id=12, reduced=False)
reduced_brick_path = download_brick("mbhb", source_id=12, reduced=True)

Warning

Reduced versions are not official L1 files and are not representative of real data, as sampling rates are not correct and some quantities might be missing. Use them only for testing and development purposes.

mojito.download.download_brick(brick, source_id=None, *, version='latest', reduced=False, cache_dir=None, username=None, token=None)[source]

Download a Mojito data file for the specified brick type and source ID.

Parameters:
  • brick (Literal['emri', 'mbhb', 'sobhb', 'vgb', 'gb'] | Literal['noise'] | Literal['combined']) – The type of data brick to download.

  • source_id (int | None, default: None) – The source identifier to download data for. Only applicable for source-specific bricks like “mbhb”, “emri”, or “gb”.

  • version (int | Literal['latest'], default: 'latest') – The version of the brick to download. Can be an integer version number or “latest” to download the most recent version.

  • reduced (bool, default: False) – Whether to download the reduced version instead of the full, official brick. Reduced versions have downsampled data and may be missing some quantities. They are not official L1 files and should only be used for testing and development purposes.

  • cache_dir (str | None, default: None) – The cache directory path, or None to use the default XDG cache directory.

  • username (str | None, default: None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. See get_credentials().

  • token (str | None, default: None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user. See get_credentials().

Return type:

str

Returns:

Path to the downloaded data file.

Source parameters

You can retrieve source parameters from the relevant catalog files using the get_source_params() function. This function downloads the catalog if needed and extracts the parameters for the specified source identifier.

from mojito.download import get_source_params

# Get source parameters for mbhb brick, source ID 12
params = get_source_params("mbhb", source_id=12)

You can also download the entire catalog file using the download_catalog() function.

from mojito.download import download_catalog

# Download mbhb catalog
catalog_path = download_catalog("mbhb")

By default, the latest version of the catalog is downloaded. To download a specific version, you can use the version parameter.

newest_catalog_path = download_catalog("mbhb", version="latest")
older_catalog_path = download_catalog("mbhb", version=2)
mojito.download.download_catalog(brick, *, version='latest', cache_dir=None, username=None, token=None)[source]

Download a catalog for the specified brick type.

Parameters:
  • brick (Literal['emri', 'mbhb', 'sobhb', 'vgb', 'gb']) – The type of catalog to download.

  • version (int | Literal['latest'], default: 'latest') – The version of the catalog to download. Can be an integer version number or “latest” to download the most recent version.

  • cache_dir (str | None, default: None) – The cache directory path, or None to use the default XDG cache directory.

  • username (str | None, default: None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. See get_credentials().

  • token (str | None, default: None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user. See get_credentials().

Return type:

str

Returns:

Path to the downloaded catalog file.

mojito.download.get_source_params(brick, source_id, *, version='latest', cache_dir=None, username=None, token=None)[source]

Get source parameters from relevant catalog.

If needed, this function downloads the source parameter catalog for the specified brick type; then retrieves parameters for the given source ID.

Parameters:
  • brick (Literal['emri', 'mbhb', 'sobhb', 'vgb', 'gb']) – The type of catalog to retrieve parameters from.

  • source_id (int) – The source identifier to retreive parameters for.

  • version (int | Literal['latest'], default: 'latest') – The version of the catalog to download. Can be an integer version number or “latest” to download the most recent version.

  • cache_dir (str | None, default: None) – The cache directory path, or None to use the default XDG cache directory.

  • username (str | None, default: None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. See get_credentials().

  • token (str | None, default: None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user. See get_credentials().

Return type:

dict[str, Any]

Returns:

Source parameters for the specified source ID.

Authentication

Mojito requires authentication to access the Nextcloud server hosting the data files. This is done using a username and an access token.

Generate access tokens

You first need to create an access token from the NextCloud Web Portal (see NEXTCLOUD_BASE_URL). Go to your account settings, navigate to the “Security” section, and create a new application password (aka access token).

You can use any application name; we recommend using “Mojito”.

Remember to store this token securely, as it will not be shown again. You can save it as an environment variable; see below.

Authentication methods

You can provide the username and token to Mojito in three ways:

  1. Environment Variables: Set the environment variables MOJITO_USERNAME and MOJITO_TOKEN with your credentials. Mojito will automatically use these values when downloading data.

  2. Function Parameters: Pass the username and token parameters directly to the download functions like download_brick(), download_catalog(), or get_source_params(). This method overrides any environment variable settings.

  3. User Prompt: If neither environment variables nor function parameters are provided, Mojito will prompt you to enter your username and token interactively when a download is attempted.

Tip

You can save your username and token in your shell’s configuration file to avoid entering them each time. For example, add the following lines to your .bashrc or .zshrc file:

# .bashrc or .zshrc
# ...
export MOJITO_USERNAME="your_username"
export MOJITO_TOKEN="your_token"
mojito.download.get_credentials(username=None, token=None)[source]

Get Nextcloud credentials for authentication.

If username and token are provided, use them directly. If one of them is not provided, first check the environment variables MOJITO_USERNAME and MOJITO_TOKEN and return them if both are set. If still not set, prompt the user for input.

Parameters:
  • username (str | None, default: None) – The Nextcloud username for authentication.

  • token (str | None, default: None) – The Nextcloud token for authentication.

Return type:

tuple[str, str]

Returns:

username

The Nextcloud username and token for authentication.

token

The Nextcloud token for authentication.

Caching

By default, files are cached in the directory following the XDG Base Directory Specification. This can be overridden by setting the environment variable MOJITO_CACHE_DIR or by passing a custom path when calling download_brick() or download_catalog().

Warning

Do not move or rename the cached files, as this will prevent Mojito from locating them in future calls.

Note

We suggest to change the cache directory to a shared filesystem location if you are using Mojito on a computing cluster, to avoid downloading the same files multiple times for different compute nodes or users.

mojito.download.get_cache_dir(cache_dir=None)[source]

Get the cache directory for Mojito data files.

If cache_dir is provided, it is used directly. Otherwise, first check the environment variable MOJITO_CACHE_DIR. If not set, use the default location from platformdirs.user_cache_dir() with “mojito” as the appname.

Parameters:

cache_dir (str | None, default: None) – The cache directory path, or None to use the environment variable or default location.

Return type:

str

Returns:

The cache directory path.

mojito.download.clear_cache(cache_dir=None)[source]

Delete all cached Mojito data files.

Warning

This will remove all files in the Mojito cache directory. Large files might need to be re-downloaded afterwards.

Return type:

None

Downloading arbitrary files

The download_file() function can be used to download any file from the Nextcloud server, given its URL. This can be useful for downloading files that are not listed in the registry, but be cautious when using this method, as it can bypass security checks. Always ensure that you are downloading from trusted sources when using this function.

mojito.download.download_file(download_url, cache_dir=None, username=None, token=None, *, progressbar=True, unsafe=False)[source]

Download a Mojito data file from Nextcloud.

Warning

Using unsafe=True bypasses the registry check and checksum verification, which can lead to security risks if downloading files from untrusted sources. Use with caution and only for trusted URLs.

Parameters:
  • download_url (str) – The URL of the data file to download.

  • cache_dir (str | None, default: None) – The cache directory path, or None to use the default XDG cache directory.

  • username (str | None, default: None) – The Nextcloud username for authentication. If None, use the environment variable or prompt the user. See get_credentials().

  • token (str | None, default: None) – The Nextcloud token/password for authentication. If None, use the environment variable or prompt the user.. See get_credentials().

  • progressbar (bool, default: True) – Whether to show a progress bar during download.

  • unsafe (bool, default: False) – Whether to use pooch’s retrieve method instead of fetch. This can be used to bypass the registry check and download files not listed in it, avoid checksum verification, and calculate and print the file’s hash after download.

Return type:

str

Returns:

Path to the downloaded data file.

Other download methods

In addition to the provided functions, a command-line interface is available. It allows you to download bricks, catalogs, or clear the cache directly from the terminal. Please consult Command-line interface for more information.

One can also download files manually using a web browser or command-line tools like rclone. The base URL for accessing the globalstorage directory on Nextcloud is given by NEXTCLOUD_BASE_URL.

mojito.download.NEXTCLOUD_BASE_URL = 'https://nextcloud-dcc-fi-csc-okd-globalstorage1.2.rahtiapp.fi'

Base URL for globalstorage on Nextcloud (rahtiapp.fi).