.. _walkthrough: Walkthrough =========== This walkthrough will guide you through the process of writing and then reading H5MD-formatted trajectories from cloud storage using AWS S3 as an example. To learn more about reading and writing trajectories from different cloud storage providers, including Google Cloud and Azure, see the :ref:`API documentation `. We will use the `YiiP 9ns trajectory from MDAnalysisData `_ as an example trajectory for the walkthrough. .. note:: In examples that read or write the :ref:`ZarrMD format `, ``zarrtraj`` is imported even though its namespace is not explicitly called. This is because MDAnalysis uses `meta classes for registering reader and writer formats `_, so calling ``import zarrtraj`` allows MDAnalysis to use the classes provided by this package. Writing trajectories to cloud storage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this section, we show how to upload both H5MD files and ZarrMD files to an S3 bucket. Setup: Creating an AWS S3 bucket ################################ First, create an AWS S3 bucket. This requires that an S3 Bucket is setup and configured for write access using the credentials stored in "sample_profile" (for the examples shown here, but a profile can be named arbitrarily). If you've never configured an S3 Bucket before, see `this guide `_. You can setup a profile to easily manage AWS credentials using `this VSCode extension `_. Here is a sample profile (stored in ~/.aws/credentials) where `the key is an access key associated with a user that has read and write permissions for the bucket `_:: [sample_profile] aws_access_key_id = Writing trajectories directly from MDAnalysis into a ZarrMD file in an S3 Bucket (preferred method) #################################################################################################### Using the created credentials with read/write access, you can write a trajectory into your bucket. You can change the stored precision of floating point values in the trajectory file with the optional ``precision`` kwarg and pass in any ```numcodecs.Codec``` compressor with the optional ``compressor`` kwarg. See `numcodecs `_ for more on the available compressors. Chunking is automatically determined for all datasets to be optimized for cloud storage reading speed and is not configurable by the user:: import zarrtraj import MDAnalysis as mda import MDAnalysisData import numcodecs import os os.environ["AWS_PROFILE"] = "sample_profile" os.environ["AWS_REGION"] = "us-west-1" if __name__ == "__main__": dataset = MDAnalysisData.yiip_equilibrium.fetch_yiip_equilibrium_short() u = mda.Universe(dataset.topology, dataset.trajectory) with mda.Writer( "s3://sample-bucket-name/YiiP_system_9ns_center.zarrmd", n_atoms=u.atoms.n_atoms, precision=3, compressor=numcodecs.Blosc(cname="zstd", clevel=9), ) as W: for ts in u.trajectory: W.write(u.atoms) Uploading an H5MD file to an S3 bucket ###################################### .. note:: Uploading H5MD trajectories programmatically to AWS S3 requires the `Boto3 package `_ to be installed. This won't be installed by default with Zarrtraj because we recommend writing directly to S3 using the :ref:`ZarrMD format ` MDAnalysis can write a trajectory from `any of its supported formats into H5MD `_. Because the ``H5MDWriter`` (provided by MDAnalysis, not this package) does not automatically determine optimal chunks, we recommend using the ```chunks``` kwarg with the MDAnalysis ``H5MDWriter`` with a value that yields ~8-16MB chunks of data for best S3 performance. Once written locally, you can upload the trajectory to S3 programatically:: import MDAnalysisData import MDAnalysis as mda import os import boto3 os.environ["AWS_PROFILE"] = "sample_profile" # This is the AWS region where the bucket is located os.environ["AWS_REGION"] = "us-west-1" def upload_h5md_file(bucket_name, file_name): s3_client = boto3.client("s3") obj_name = os.path.basename(file_name) response = s3_client.upload_file(file_name, bucket_name, obj_name) if __name__ == "__main__": dataset = MDAnalysisData.yiip_equilibrium.fetch_yiip_equilibrium_short() u = mda.Universe(dataset.topology, dataset.trajectory) with mda.Writer( "YiiP_system_9ns_center.h5md", n_atoms=u.atoms.n_atoms, # (111815 atoms * 4 bytes per float * 3 (xyz)) = ~1.34 MB per frame # 8 frames per chunk to reach goal of 8-12 MB per chunk chunks=(8, u.atoms.n_atoms, 3), ) as W: for ts in u.trajectory: W.write(u.atoms) upload_h5md_file( "sample-bucket-name", "YiiP_system_9ns_center.h5md", ) You can also upload the H5MD file directly using the AWS web interface by navigating to S3, the bucket name, and pressing "upload". Reading your H5MD file ^^^^^^^^^^^^^^^^^^^^^^ After the file is uploaded, you can use the same credentials to stream the file into MDAnalysis:: import zarrtraj import MDAnalysis as mda import MDAnalysisData import os os.environ["AWS_PROFILE"] = "sample_profile" os.environ["AWS_REGION"] = "us-west-1" dataset = MDAnalysisData.yiip_equilibrium.fetch_yiip_equilibrium_short() # here, we show the .zarrmd file being read, but the .h5md file could be read identically u = mda.Universe(dataset.topology, "s3://sample-bucket-name/YiiP_system_9ns_center.zarrmd") protein = u.select_atoms("protein") for ts in u.trajectory[::100]: print(f"{ts.frame}, {ts.time}, {protein.center_of_mass()}") If you have additional questions, please don't hesitate to open a discussion on the `zarrtarj github `_. The `MDAnalysis discord `_ is also a great resource for asking questions and getting involved in MDAnalysis.