Utilities

This module contains utility functions for various tasks.

mojito.utils.print_hdf5_structure(path, *, print_attrs=False)[source]

Print the structure of an HDF5 file with visual hierarchy.

Recursively prints the hierarchy of groups and datasets in an HDF5 file, displaying their names, types, and shapes. Optionally prints attributes for groups and datasets.

Parameters:
  • path (str | Path) – Path to the HDF5 file.

  • print_attrs (bool, default: False) – Whether to print attributes for groups and datasets.

Return type:

None

Example

>>> print_hdf5_structure("data.h5", print_attrs=True)
HDF5 Structure of data.h5
└── [GROUP] /
    ├── @version: 1.0
    ├── [DATASET] /dataset1
    │     dtype: float64, shape: (100,)
    └── [GROUP] /group1
        └── [DATASET] /group1/dataset2
              dtype: int32, shape: (50, 50)
              @units: meters
mojito.utils.assert_datasets_almost_equal(datasets, *, chunk=100000)[source]

Assert that the given datasets are almost equal, reading them in chunks.

This is a helper function to check that datasets are identical across groups without loading them entirely into memory.

Note that chunking is only done along the first dimension of the datasets, so this function is most effective for datasets where the first dimension is the largest.

Parameters:
  • datasets (Sequence[Dataset]) – Sequence of datasets to compare. All datasets must have the same shape.

  • chunk (int, default: 100000) – Chunk size to use on the first axis when comparing the datasets. This limits memory usage when comparing large datasets.

Raises:
  • ValueError – If the chunk size is not positive.

  • AssertionError – If the datasets have different shapes.

  • AssertionError – If the datasets are not almost equal within the given chunk size.

Return type:

None