Benchmarks
==========
Speed benchmarks are available via AirSpeedVelocity
`here `_
Initial benchmarks were performed in the `Beckstein Lab `_
on Spudda, which has:
- 2 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
- 12 total cores
- 32GB RAM
Local file speed tests were performed in the 1.31 TB SSD scratch space using RAID 0.
The following metrics were measured:
- ``ZARRH5MDDiskStrideTime``: Time to iterate through all timesteps in SSD-stored trajectory files
using compressed & uncompressed zarrmd and h5md files.
- ``ZARRH5MDS3StrideTime``: Time to iterate through all timesteps in S3-stored trajectory files
using compressed & uncompressed zarrmd and h5md files.
- ``H5MDReadersDiskStrideTime``: Time to iterate through all timesteps in an SSD-stored trajectory file
using compressed & uncompressed h5md files comparing the :class:`MDAnalysis.coordinates.H5MDReader`
and :class:`zarrtraj.ZARRH5MDReader` classes.
- ``H5MDFmtDiskRMSFTime``: Time to calculate the root mean square fluctuation (RMSF) of the trajectory
using compressed & uncompressed SSD-stored zarrmd files comparing the :class:`MDAnalysis.analysis.rms.RMSF`
method and a ``dask`` parallelized version of the same method.
- ``H5MDFmtAWSRMSFTime``: Time to calculate the root mean square fluctuation (RMSF) of the trajectory
using compressed & uncompressed S3-stored zarrmd files comparing the :class:`MDAnalysis.analysis.rms.RMSF`
method and a ``dask`` parallelized version of the same method.
For all benchmarks, the trajectory file used was the
`YiiP trajectory `_
aligned using the ``MDAnalysis`` :class:`MDAnalysis.analysis.align.AlignTraj` class
rewritten in the ``zarrmd`` and ``H5MD`` formats using the ``zarrtraj`` package.
Highlights:
- The dask parallelized RMSF calculation performed ~4x faster than the serial calculation via MDAnalysis
on both local and S3-stored trajectory files. While this method is not yet implemented in ``zarrtraj``,
it may be in a future version
- The ``ZARRH5MDReader`` class performed ~2-4x faster than the ``H5MDReader`` class on iterating through
local trajectory files, though this may be because the files were written using a chunking strategy
favorable to the ``ZARRH5MDReader`` class.
- For each trajectory file, iterating through its timesteps using the ``ZARRH5MDReader`` from S3 storage took about twice as long
as iterating through the same file from local SSD storage.