This page was generated from examples/iterload.ipynb.

Out-of-core calculations with md.iterload

[1]:
import mdtraj as md
import numpy as np

np.set_printoptions(threshold=50)

Sometimes your molecular dynamics trajectory files are too large to fit into memory. This can make analysis a burden, because you have to be very aware of the size of various objects. This can be a challenge in python because of the language’s automatic memory management.

Fortunately, python provides the iterator protocol that can help us out here. We can “stream through” a trajectory, without loading the entire thing into memory at all. Instead, we’ll process it in chunks.

For the purpose of this example, we’ll use a short trajectory that’s included with MDTraj for testing purposes. When you use this recipe yourself, you probably will want to point your code to your own trajectory file

[2]:
traj_filename = "data/frame0.h5"

First, if you only want a single frame of a trajectory, there’s no reason to load up the whole thing. md.load_frame can load up a single frame for you. Let’s get the first one.

[3]:
first_frame = md.load_frame(traj_filename, 0)
first_frame
[3]:
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, and unitcells at 0x168ba2b00>

Using md.iterload, you can iterate through chunks of the trajectory. If you don’t retain a reference to the chunk as you iterate through, then the python garbage collector can recycle the memory.

[4]:
rmsds = []
for chunk in md.iterload(traj_filename, chunk=100):
    rmsds.append(md.rmsd(chunk, first_frame))
    print(chunk, "\n", chunk.time)
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
 [500.00003 501.00003 502.00003 ... 597.      598.      599.     ]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
 [600.      601.      602.      ... 697.00006 698.00006 699.00006]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
 [700.00006 701.00006 702.00006 ... 797.00006 798.00006 799.00006]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
 [800.00006 801.00006 802.00006 ... 897.00006 898.00006 899.00006]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
 [900.00006 901.00006 902.00006 ... 997.00006 998.00006 999.00006]
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, and unitcells>
 [1000.00006]

Now, we’ve calculated all of the rmsds chunk by chunk, and we can take a look at them.

[5]:
rmsds = np.concatenate(rmsds)

print(rmsds)
print("max rmsd ", np.max(rmsds), "at index", np.argmax(rmsds))
[0.         0.05957904 0.12331477 ... 0.12567955 0.08629464 0.14820947]
max rmsd  0.18972313 at index 44