The power of modern computers makes it possible to perform simulations
of highly complex chemical systems. These simulations provide information
on the motions of thousands or sometimes even millions of atoms, which
can be very difficult to interpret. A commonly employed analysis strategy is
to calculate a histogram along a few collective variables (CVs) as from this
one can calculate the underlying free energy. Finding appropriate collective
variables is difficult and is often done by using physical/chemical intuition
obtained from experiments. This is obviously problematic if we want to
predict new chemical structures or novel reaction mechanisms based on simulations
alone. We are thus interested in using machine learning algorithms
to generate simplified representations of the data obtainable from atomistic
simulations so that it can be more easily understood and interpreted by a
human user.
A sketchmap projection for a 50ns reconnaissance metadynamics simulations
of ala12. The embedded points colored in accordance with the number of residues that,
according to STRIDE, are part of an alpha helix or beta sheet. Remarkably sketchmap
is able to distinguish configurations with different secondary structure types even though
the algorithm is given no information on what constitutes an alpha helix or beta sheet.
The work in the department on these lines focusses on the application
and the development of the sketch map algorithm [1, 2]. This algorithm
takes as input a set of highdimensional landmark frames and generates a
lowdimensional map that shows how these points are distributed in the
highdimensional space. Projections of further highdimensional points can
then easily calculated using the data on the positions of the landmarks and
their projections. We have successfully applied this algorithm to studying
problems in protein folding [1]. The sketchmap code is available to download
from http://sketchmap.berlios.de.
The free energy surface for ala12 in implicit solvent at the unfolding temperature
displayed as a function of a set of sketchmap coordinates. In this representation the full
complexity of the unfolded state and the many energetic basins that comprise it is visible.
References:
[1] M. Ceriotti, G. A. Tribello and M. Parrinello. Proceedings of the National Academy of Sciences, (2011).
[2] G. A. Tribello, M. Ceriotti and M. Parrinello. Proceedings of the National Academy of Sciences, 109(14), 5196–5201 (2012).
Staff involved
