Automated analysis of simulation trajectories

The power of modern computers makes it possible to perform simulations of highly complex chemical systems. These simulations provide information on the motions of thousands or sometimes even millions of atoms, which can be very difficult to interpret. A commonly employed analysis strategy is to calculate a histogram along a few collective variables (CVs) as from this one can calculate the underlying free energy. Finding appropriate collective variables is difficult and is often done by using physical/chemical intuition obtained from experiments. This is obviously problematic if we want to predict new chemical structures or novel reaction mechanisms based on simulations alone. We are thus interested in using machine learning algorithms to generate simplified representations of the data obtainable from atomistic simulations so that it can be more easily understood and interpreted by a human user.

ala-sketch-assembly A sketch-map projection for a 50ns reconnaissance metadynamics simulations of ala12. The embedded points colored in accordance with the number of residues that, according to STRIDE, are part of an alpha helix or beta sheet. Remarkably sketch-map is able to distinguish configurations with different secondary structure types even though the algorithm is given no information on what constitutes an alpha helix or beta sheet.

The work in the department on these lines focusses on the application and the development of the sketch map algorithm [1, 2]. This algorithm takes as input a set of high-dimensional landmark frames and generates a low-dimensional map that shows how these points are distributed in the high-dimensional space. Projections of further high-dimensional points can then easily calculated using the data on the positions of the landmarks and their projections. We have successfully applied this algorithm to studying problems in protein folding [1]. The sketch-map code is available to download from

ala-fes.png The free energy surface for ala12 in implicit solvent at the unfolding temperature displayed as a function of a set of sketch-map coordinates. In this representation the full complexity of the unfolded state and the many energetic basins that comprise it is visible.


[1] M. Ceriotti, G. A. Tribello and M. Parrinello. Proceedings of the National Academy of Sciences, (2011).

[2] G. A. Tribello, M. Ceriotti and M. Parrinello. Proceedings of the National Academy of Sciences, 109(14), 5196–5201 (2012).

Staff involved

Gareth Tribello