Benchmarks
Speed benchmarks are available via AirSpeedVelocity here
Initial benchmarks were performed in the Beckstein Lab on Spudda, which has:
2 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
12 total cores
32GB RAM
Local file speed tests were performed in the 1.31 TB SSD scratch space using RAID 0.
The following metrics were measured:
ZARRH5MDDiskStrideTime: Time to iterate through all timesteps in SSD-stored trajectory filesusing compressed & uncompressed zarrmd and h5md files.
ZARRH5MDS3StrideTime: Time to iterate through all timesteps in S3-stored trajectory filesusing compressed & uncompressed zarrmd and h5md files.
H5MDReadersDiskStrideTime: Time to iterate through all timesteps in an SSD-stored trajectory fileusing compressed & uncompressed h5md files comparing the
MDAnalysis.coordinates.H5MDReaderandzarrtraj.ZARRH5MDReaderclasses.
H5MDFmtDiskRMSFTime: Time to calculate the root mean square fluctuation (RMSF) of the trajectoryusing compressed & uncompressed SSD-stored zarrmd files comparing the
MDAnalysis.analysis.rms.RMSFmethod and adaskparallelized version of the same method.
H5MDFmtAWSRMSFTime: Time to calculate the root mean square fluctuation (RMSF) of the trajectoryusing compressed & uncompressed S3-stored zarrmd files comparing the
MDAnalysis.analysis.rms.RMSFmethod and adaskparallelized version of the same method.
For all benchmarks, the trajectory file used was the
YiiP trajectory
aligned using the MDAnalysis MDAnalysis.analysis.align.AlignTraj class
rewritten in the zarrmd and H5MD formats using the zarrtraj package.
Highlights:
The dask parallelized RMSF calculation performed ~4x faster than the serial calculation via MDAnalysis on both local and S3-stored trajectory files. While this method is not yet implemented in
zarrtraj, it may be in a future versionThe
ZARRH5MDReaderclass performed ~2-4x faster than theH5MDReaderclass on iterating through local trajectory files, though this may be because the files were written using a chunking strategy favorable to theZARRH5MDReaderclass.For each trajectory file, iterating through its timesteps using the
ZARRH5MDReaderfrom S3 storage took about twice as long as iterating through the same file from local SSD storage.