BPTI Spatial Decorrelation

# invoke some initial libraries that we want as part of this setup
import matplotlib.pylab as plt
import numpy as np
import MDAnalysis as mdanal
import pyemma
import os
import pyemma.coordinates as coor
import pyemma.msm as msm
import pyemma.plots as mplt

%matplotlib inline

Select Coordinates of Interest

The function below is a generic driver for obtaining coordinates of interest. The line with Ca specifies the coordinates selected, and is based on MDAnalysis selection syntax (http://pythonhosted.org/MDAnalysis/documentation_pages/selection.html). You can use a variety of atom selections that can be returned as a numpy array.

def getCoordinates(pdbFileName, trjFileName):
    u = mdanal.Universe(pdbFileName, trjFileName, permissive=False);
    frames = [];
    Ca = u.select_atoms('name CA');
    for ts in u.trajectory:
        frames.append(Ca.positions.T);
    return np.array(frames);

Loading data: BPTI ms Trajectory

This example uses the 1 millisecond trajectory of Bovine Trypsin Inhibitor (BPTI) generated by DE Shaw Research on the Anton Supercomputer [1]. We are using only Ca-coordinates in order to manage the data size.

trajfile = '/Users/fxp/BPTI-Analysis/bpti_ca_1ms_dt10ns_aligned.xtc'
topfile = '/Users/fxp/BPTI-Analysis/bpti_ca.pdb'
/Users/fxp/anaconda2/lib/python2.7/site-packages/pyemma/__init__.py:91: UserWarning: You are not using the latest release of PyEMMA. Latest is 2.4, you have 2.3.2.
  .format(latest=latest, current=current), category=UserWarning)
feat = coor.featurizer(topfile)
feat.add_selection(feat.select_Ca());
inp = coor.source(trajfile, feat);
# to generate pretty plots
plt.style.use('ggplot')

We import our main library called Anharmonic conformational analysis (pyANCA). Anca provides a number of functions that allows one to measure anharmonicity from MD simulations. Here we demonstrate the use of anca in quantifying fourth order statistics for a) the whole simulation and b) identifying which residues spend significant time exhibiting anharmonic fluctuations.

import pyANCA

Align Coordinates

We need to align the selected coordinates. This is to ensure that we have removed the translations and rotations and have a set of coordinates on which we can perform our analysis. anca provides support for two types of coordinate alignments: (1) The standard Kabsch alignment, and (2) an iterative alignment algorithm. You can use either function to align your trajectories.

Note:

We assume that you have taken care of putting protein chains or independent molecules together in the trajectories before aligning them. There are lots o pointers available, for e.g., in VMD you can use (PBCTools) [http://www.ks.uiuc.edu/Research/vmd/plugins/pbctools/]. Depending on which software you used for your simulations the procedures for putting together individual chains or molecules can be different.

coords = getCoordinates(topfile, trajfile)
from anca import IterativeMeansAlign
iterAlign = IterativeMeansAlign.IterativeMeansAlign();
[itr, avgCoordsAll, eRMSDAll, newCoordsAll] = iterAlign.iterativeMeans(coords, 0.001, 12);
/Users/fxp/anaconda2/lib/python2.7/site-packages/MDAnalysis/lib/mdamath.py:151: RuntimeWarning: invalid value encountered in float_scalars
  angle = np.arccos(np.dot(a, b) / (norm(a) * norm(b)))
No handlers could be found for logger "main.IterativeMeansAlign"
print newCoordsAll.shape
(412497, 3, 58)
coordsAll = np.reshape(newCoordsAll, (len(newCoordsAll), 3*58)).T;
avgCoordsAll = np.mean(coordsAll, 1); #print avgCoords;
tmpAll = np.reshape(np.tile(avgCoordsAll, 412497), (412497,3*58)).T;
caDevsMDall = coordsAll - tmpAll;
print caDevsMDall.shape;
(174, 412497)
print coordsAll.shape
(174, 412497)

Spatial Decorrelation of Order 2 (SD2)

Parameters:

data – a 3n x T data matrix (number 3 is due to the x,y,z coordinates for each atom). Maybe a numpy
array or a matrix where,

n: size of the protein

T: number of snapshots of MD trajectory

m – dimensionality of the subspace we are interested in; Default value is None, in which case m = n
verbose – print information on progress. Default is true.

Returns:

A 3n x m matrix U (NumPy matrix type), such that Y = U * data is a 2nd order spatially whitened
coordinates extracted from the 3n x T data matrix. If m is omitted, U is a square 3n x 3n matrix.

Ds: has eigen values sorted by increasing variance

PCs: holds the index for m most significant principal components by decreasing variance S = Ds[PCs]

S – Eigen values of the ‘data’ covariance matrix

B – Eigen vectors of the ‘data’ covariance matrix. The eigen vectors are orthogonal.
from pyANCA import SD2
(Y, S, B, U) = SD2.SD2(coordsAll.T, m=174);
2nd order Spatial Decorrelation -> Looking for 174 sources
2nd order Spatial Decorrelation -> Removing the mean value
2nd order Spatial Decorrelation -> Whitening the data
np.savetxt('eigvectorsSD2.out',B,delimiter=',')
np.savetxt('eigenvaluesSD2.out',S,delimiter=',')

Spatial Decorrelation Module of Order 4 (SD4)

Parameters:

Y -- an mxT spatially whitened matrix (m dimensionality of subspace, T snapshots). May be a numpy
array or a matrix where

m -- dimensionality of the subspace we are interested in. Defaults to None, in which case m=n.

T -- number of snapshots of MD trajectory

U -- whitening matrix obtained after doing the PCA analysis on m components of real data

verbose -- print info on progress. Default is True.

Returns:

W -- a separating matrix for spatial decorrelation of order 4
from pyANCA import SD4
W = SD4.SD4(Y[0:10,:], m=10, U=U[0:10,:])
4th order Spatial Decorrelation -> Estimating cumulant matrices
SD4 -> Contrast optimization by joint diagonalization
SD4 -> Sweep #  0 completed in 45 rotations
SD4 -> Sweep #  1 completed in 45 rotations
SD4 -> Sweep #  2 completed in 45 rotations
SD4 -> Sweep #  3 completed in 45 rotations
SD4 -> Sweep #  4 completed in 45 rotations
SD4 -> Sweep #  5 completed in 45 rotations
SD4 -> Sweep #  6 completed in 44 rotations
SD4 -> Sweep #  7 completed in 45 rotations
SD4 -> Sweep #  8 completed in 45 rotations
SD4 -> Sweep #  9 completed in 44 rotations
SD4 -> Sweep # 10 completed in 44 rotations
SD4 -> Sweep # 11 completed in 43 rotations
SD4 -> Sweep # 12 completed in 42 rotations
SD4 -> Sweep # 13 completed in 43 rotations
SD4 -> Sweep # 14 completed in 41 rotations
SD4 -> Sweep # 15 completed in 40 rotations
SD4 -> Sweep # 16 completed in 40 rotations
SD4 -> Sweep # 17 completed in 38 rotations
SD4 -> Sweep # 18 completed in 36 rotations
SD4 -> Sweep # 19 completed in 34 rotations
SD4 -> Sweep # 20 completed in 35 rotations
SD4 -> Sweep # 21 completed in 32 rotations
SD4 -> Sweep # 22 completed in 30 rotations
SD4 -> Sweep # 23 completed in 30 rotations
SD4 -> Sweep # 24 completed in 26 rotations
SD4 -> Sweep # 25 completed in 24 rotations
SD4 -> Sweep # 26 completed in 19 rotations
SD4 -> Sweep # 27 completed in 17 rotations
SD4 -> Sweep # 28 completed in 17 rotations
SD4 -> Sweep # 29 completed in 11 rotations
SD4 -> Sweep # 30 completed in 4 rotations
SD4 -> Sweep # 31 completed in 1 rotations
SD4 -> Sweep # 32 completed in 0 rotations
TD4 -> Total of 1095 Givens rotations
TD4 -> Sorting the components
TD4 -> Fixing the signs
(10, 174)

References

  1. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomic-level characterization of the structural dynamics of proteins. Science 330:341-346 (2010). doi: 10.1126/science.1187409.