Thoughts on Continuous vs multiple MD

  1. Robert Molt at AMBER ARCHIVE

As was stated in previous emails, no one claims that you can make
absurdly short simulations (5ns) and get the same result as a long
simulation (except in the limit of an infinite number of simulations).
Your example did not prove the superiority of long trajectory runs; you
just did a simulation so small that nothing could happen.

The question of multiple short vs. one long trajectory is only
interesting when the short trajectories still have a chance of sampling
the relevant event.

Which again goes to the previous answer given: the answer is system
dependent. You have to know something of the time scale of the
processes involved to say which is better (multiple short vs. long
trajectories) in a given case.

If I sample 1s of your life a 4,000 times, that’s a good way to sample
events like “How often does he look at a computer screen?” but not “How
often does he eat food?” If I sample one event for 4,000, that’s good
for both questions.

2. By Ross

Ultimately one long simulation vs many short simulations are the same
thing. In other words an infinitely long simulation is exactly the same as
an infinite number of one step simulations. Of course the caveat here is
that your starting conditions for the infinite short simulations are
independent and in this extreme example represent a correctly weighted set
of accessible structures. The underlying issue is that most of our
simulations are biased by the starting conditions. We almost all
exclusively start simulations from the crystal structure, and for reasons
I could never understand consider that an RMSD that stays close to the
crystal structure is good.

I think the real problem is one of a lack of error estimation. Far too
many MD papers I see don’t have any decent (if any) error estimation. So
for example one would run a single 50ns simulation and report a binding
free energy. That’s really not much use to anybody. Why 50ns? Why not 5ns?
Why not 500ns? – What a reviewer is most likely getting at when he/she
asks for repeats is actually a coded way of saying: “show me that you
didn’t just conveniently stop your single simulation when you got the
answer you wanted.” – Of course they can’t say that directly so they ask
you for repeats.

I think what is really needed in such papers is evidence of convergence
that allows a reader to decide how much they want to trust your results.
For example if one is calculating a binding energy say. Why not produce a
plot showing the binding energy prediction as a function of the simulation
length. Essentially a cumulative average. It would allow someone to see
how much faith they can put in the final binding energy reported.

Now, what you really want to do is make your results bullet proof within
all reasonable doubt. So how could you do that. Well, you could run
another simulation from the same starting structure with a different seed
and plot the cumulative average for that on the same plot – this will give
an idea of what the spread might be in your results – as in how much to
trust it. Note this is VERY different from just adding error bars to plots
(most of which are wrong by the way) based on the number and range of
points seen in a single run. Using such errors bars is of little use since
you cannot make estimates of the error in your results based on a lack of
knowledge for what you could have sampled. Too many people give things
like 5.0Kcal/mol +/- 0.5 kcal/mol – which is simply wishful thinking and
gives the unwary reader way more faith in the result than they should
have. It is up to you as the author of the work to properly address
uncertainty in the results. So a reviewer who asks for multiple repeats is
really doing the author a favor.

Now a separate argument is one of how to improve sampling. I don’t have
concrete numbers but what you need to ask is:

Does a single 500ns simulation started from the crystal structure sample
phase space better than 10x 50ns simulations started from the same crystal
structure? – I would offer that these are likely to be very similar.
However, one should always extrapolate such things to their limits. So how
does it compare to 100x 5ns? or 1000x 0.5ns? or 10,000x 50ps?

I think most people would agree that the last one, possible two, would
give you terrible sampling. So for multiple runs where is the sweet spot?
– That is the bit that will be system dependent and a function of what you
are looking to calculate. That said what one really wants to be doing is
running lots of simulations from an equilibrium set of starting geometries
– So what about running 500ns, taking snapshots every 10ns and running
these for another XXX ns each? This may give better sampling than the idea
of starting everything from the same starting structure.

Ultimately though I think the underlying motivation for a reviewer to ask
for multiple simulations has nothing to do with questioning whether your
sampling was reasonable etc. They are just asking for you to prove, beyond
reasonable doubt, that your conclusions can be considered reliable and
reproducible and isn’t just based on some fortuitous one off.