YiiP Protein Example
To get started immediately with Zarrtraj, we have made the topology and trajectory of the YiiP protein in a POPC membrane publicly available for streaming (no read/write credentials are needed as in the walkthrough). The trajectory is stored in in the ZarrMD format for optimal streaming performance.
To access the trajectory, you can copy and paste this code into a python script:
import zarrtraj
import MDAnalysis as mda
import fsspec
with fsspec.open("gcs://zarrtraj-test-data/YiiP_system.pdb", "r") as top:
u = mda.Universe(
top, "gcs://zarrtraj-test-data/yiip.zarrmd", topology_format="PDB"
)
protein = u.select_atoms("protein")
for ts in u.trajectory[::100]:
print(f"{ts.frame}, {ts.time}, {protein.center_of_mass()}")
In this example, we first import all necessary packages:
zarrtrajfor the functionality to read a trajectory from cloud-storage; it automatically hooks into MDAnalysis to make the trajectory available as part of aUniverseNote
Whenever you want to read or write the ZarrMD format, you need to
import zarrtraj. You do not have to explicitly call any functions or classes inside thezarrtrajpackage because on import it automatically registers itself as a reader/writer withMDAnalysis. Theimport zarrtrajtogether with importing MDAnalysis (in any order) is sufficient for MDAnalysis to “know” how to work with trajectories stored in the cloud in zarr and h5md format.fsspec to access a simple file in cloud storage with a file-like interface
Note
While there is not yet an officially recommended way to access cloud-stored topologies, this method of opening a Python File-like object from the topology URL in PDB format using fsspec works with MDAnalysis 2.7.0. Check back later for further development!
We then create the basic MDAnalysis data structure, the Universe,
from the topology file “YiiP_system.pdb” (which contains static information about the individual
atoms such as their names, types, and organization as a biomolecule) and the data that change over time,
namely the positions of the atoms in the trajectory “yiip.zarrmd”. Both topology and trajectory
files can be stored in the cloud (here in Google Cloud Storage).
The specific URI string for the trajectory tells MDAnalysis to use zarrtraj to access the file.
We then use standard MDAnalysis functionality to first select a part of the system for
analysis, namely, the protein, and then iterate over the trajectory in steps of 100
frames. For each loaded frame, we print information about the frame number and recorded
time in the trajectory and perform a simple analysis task by calculating and printing
the center of mass (see AtomGroup.center_of_mass
of the protein.
See also
To see an executable example of running a full MDAnalysis
RMSD analysis on this trajectory in a
Jupyter notebook, see the rmsd_yiip.ipynb example notebook on GitHub.