BPTI Spatial Decorrelation ========================== .. code:: ipython2 # invoke some initial libraries that we want as part of this setup import matplotlib.pylab as plt import numpy as np import MDAnalysis as mdanal import pyemma import os import pyemma.coordinates as coor import pyemma.msm as msm import pyemma.plots as mplt %matplotlib inline Select Coordinates of Interest The function below is a generic driver for obtaining coordinates of interest. The line with Ca specifies the coordinates selected, and is based on MDAnalysis selection syntax (http://pythonhosted.org/MDAnalysis/documentation\_pages/selection.html). You can use a variety of atom selections that can be returned as a numpy array. .. code:: ipython2 def getCoordinates(pdbFileName, trjFileName): u = mdanal.Universe(pdbFileName, trjFileName, permissive=False); frames = []; Ca = u.select_atoms('name CA'); for ts in u.trajectory: frames.append(Ca.positions.T); return np.array(frames); Loading data: BPTI ms Trajectory This example uses the 1 millisecond trajectory of Bovine Trypsin Inhibitor (BPTI) generated by DE Shaw Research on the Anton Supercomputer [1]. We are using only Ca-coordinates in order to manage the data size. .. code:: ipython2 trajfile = '/Users/fxp/BPTI-Analysis/bpti_ca_1ms_dt10ns_aligned.xtc' topfile = '/Users/fxp/BPTI-Analysis/bpti_ca.pdb' .. parsed-literal:: /Users/fxp/anaconda2/lib/python2.7/site-packages/pyemma/__init__.py:91: UserWarning: You are not using the latest release of PyEMMA. Latest is 2.4, you have 2.3.2. .format(latest=latest, current=current), category=UserWarning) .. code:: ipython2 feat = coor.featurizer(topfile) feat.add_selection(feat.select_Ca()); inp = coor.source(trajfile, feat); .. code:: ipython2 # to generate pretty plots plt.style.use('ggplot') We import our main library called Anharmonic conformational analysis (pyANCA). Anca provides a number of functions that allows one to measure anharmonicity from MD simulations. Here we demonstrate the use of anca in quantifying fourth order statistics for a) the whole simulation and b) identifying which residues spend significant time exhibiting anharmonic fluctuations. .. code:: ipython2 import pyANCA Align Coordinates We need to align the selected coordinates. This is to ensure that we have removed the translations and rotations and have a set of coordinates on which we can perform our analysis. anca provides support for two types of coordinate alignments: (1) The standard Kabsch alignment, and (2) an iterative alignment algorithm. You can use either function to align your trajectories. Note: We assume that you have taken care of putting protein chains or independent molecules together in the trajectories before aligning them. There are lots o pointers available, for e.g., in VMD you can use (PBCTools) [http://www.ks.uiuc.edu/Research/vmd/plugins/pbctools/]. Depending on which software you used for your simulations the procedures for putting together individual chains or molecules can be different. .. code:: ipython2 coords = getCoordinates(topfile, trajfile) from anca import IterativeMeansAlign iterAlign = IterativeMeansAlign.IterativeMeansAlign(); [itr, avgCoordsAll, eRMSDAll, newCoordsAll] = iterAlign.iterativeMeans(coords, 0.001, 12); .. parsed-literal:: /Users/fxp/anaconda2/lib/python2.7/site-packages/MDAnalysis/lib/mdamath.py:151: RuntimeWarning: invalid value encountered in float_scalars angle = np.arccos(np.dot(a, b) / (norm(a) * norm(b))) No handlers could be found for logger "main.IterativeMeansAlign" .. code:: ipython2 print newCoordsAll.shape .. parsed-literal:: (412497, 3, 58) .. code:: ipython2 coordsAll = np.reshape(newCoordsAll, (len(newCoordsAll), 3*58)).T; avgCoordsAll = np.mean(coordsAll, 1); #print avgCoords; tmpAll = np.reshape(np.tile(avgCoordsAll, 412497), (412497,3*58)).T; caDevsMDall = coordsAll - tmpAll; print caDevsMDall.shape; .. parsed-literal:: (174, 412497) .. code:: ipython2 print coordsAll.shape .. parsed-literal:: (174, 412497) Spatial Decorrelation of Order 2 (SD2) Parameters: :: data – a 3n x T data matrix (number 3 is due to the x,y,z coordinates for each atom). Maybe a numpy array or a matrix where, n: size of the protein T: number of snapshots of MD trajectory m – dimensionality of the subspace we are interested in; Default value is None, in which case m = n verbose – print information on progress. Default is true. Returns: :: A 3n x m matrix U (NumPy matrix type), such that Y = U * data is a 2nd order spatially whitened coordinates extracted from the 3n x T data matrix. If m is omitted, U is a square 3n x 3n matrix. Ds: has eigen values sorted by increasing variance PCs: holds the index for m most significant principal components by decreasing variance S = Ds[PCs] S – Eigen values of the ‘data’ covariance matrix B – Eigen vectors of the ‘data’ covariance matrix. The eigen vectors are orthogonal. .. code:: ipython2 from pyANCA import SD2 (Y, S, B, U) = SD2.SD2(coordsAll.T, m=174); .. parsed-literal:: 2nd order Spatial Decorrelation -> Looking for 174 sources 2nd order Spatial Decorrelation -> Removing the mean value 2nd order Spatial Decorrelation -> Whitening the data .. code:: ipython2 np.savetxt('eigvectorsSD2.out',B,delimiter=',') np.savetxt('eigenvaluesSD2.out',S,delimiter=',') Spatial Decorrelation Module of Order 4 (SD4) Parameters: ::             Y -- an mxT spatially whitened matrix (m dimensionality of subspace, T snapshots). May be a numpy                 array or a matrix where        m -- dimensionality of the subspace we are interested in. Defaults to None, in which case m=n.         T -- number of snapshots of MD trajectory         U -- whitening matrix obtained after doing the PCA analysis on m components of real data                 verbose -- print info on progress. Default is True.     Returns: :: W -- a separating matrix for spatial decorrelation of order 4 .. code:: ipython2 from pyANCA import SD4 W = SD4.SD4(Y[0:10,:], m=10, U=U[0:10,:]) .. parsed-literal:: 4th order Spatial Decorrelation -> Estimating cumulant matrices SD4 -> Contrast optimization by joint diagonalization SD4 -> Sweep # 0 completed in 45 rotations SD4 -> Sweep # 1 completed in 45 rotations SD4 -> Sweep # 2 completed in 45 rotations SD4 -> Sweep # 3 completed in 45 rotations SD4 -> Sweep # 4 completed in 45 rotations SD4 -> Sweep # 5 completed in 45 rotations SD4 -> Sweep # 6 completed in 44 rotations SD4 -> Sweep # 7 completed in 45 rotations SD4 -> Sweep # 8 completed in 45 rotations SD4 -> Sweep # 9 completed in 44 rotations SD4 -> Sweep # 10 completed in 44 rotations SD4 -> Sweep # 11 completed in 43 rotations SD4 -> Sweep # 12 completed in 42 rotations SD4 -> Sweep # 13 completed in 43 rotations SD4 -> Sweep # 14 completed in 41 rotations SD4 -> Sweep # 15 completed in 40 rotations SD4 -> Sweep # 16 completed in 40 rotations SD4 -> Sweep # 17 completed in 38 rotations SD4 -> Sweep # 18 completed in 36 rotations SD4 -> Sweep # 19 completed in 34 rotations SD4 -> Sweep # 20 completed in 35 rotations SD4 -> Sweep # 21 completed in 32 rotations SD4 -> Sweep # 22 completed in 30 rotations SD4 -> Sweep # 23 completed in 30 rotations SD4 -> Sweep # 24 completed in 26 rotations SD4 -> Sweep # 25 completed in 24 rotations SD4 -> Sweep # 26 completed in 19 rotations SD4 -> Sweep # 27 completed in 17 rotations SD4 -> Sweep # 28 completed in 17 rotations SD4 -> Sweep # 29 completed in 11 rotations SD4 -> Sweep # 30 completed in 4 rotations SD4 -> Sweep # 31 completed in 1 rotations SD4 -> Sweep # 32 completed in 0 rotations TD4 -> Total of 1095 Givens rotations TD4 -> Sorting the components TD4 -> Fixing the signs (10, 174) References 1. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomic-level characterization of the structural dynamics of proteins. Science 330:341-346 (2010). doi: 10.1126/science.1187409. .. vim: tw=75