/ elang / concurrency / determinism

Uses of Determinism

(Derived from this email message)

This page represents speculative future plans for E. The features it describes are not currently supported. Until they are explored further, it is uncertain which of these are actually in E's future.

Models of Computation

The Turing and Von Neumann models of computation are strictly deterministic (as is that other Von Neumann model, cellular automata). In all of these systems, given the initial state of a computation and the specification of the machine it ran on, one can necessarily deterministically replay the computation.

Of course, this is true only because these models idealize-away many of the messy issues that plague real computation. Actors are an alternative foundational model of computation that idealizes-in many of the issues that these systems idealize-away. (Though the reverse is also true.)

The first such issue, explored by many besides Actors, is the contrast of "transformational" vs "reactive" computation. A compiler is transformational: all its inputs exist before it begins, it runs deterministically producing output, and terminates. A compiler that doesn't terminate is buggy.

A calculator is reactive: new inputs come in after it's started, these inputs effect both its state and its outputs. Its outputs may effect what further inputs it gets, so these outputs need to be transmitted before the calculator terminates. We do not require the calculator to terminate in order to be correct. Indeed, for a calculator, the opposite is true. Even though you can replay a calculator from a log of its inputs, you cannot model a calculator as something that computes from pre-existing inputs, since the inputs depend (via the external human) on previous outputs.

Actors are inherently distributed and asynchronous, and so are pervasively non-deterministic. Actors successfully idealize all non-determinism as arrival order non-determinism: Messages race towards an actor from multiple asynchronous sources. As Einstein would insist, these messages had no prior determined intrinsic full order. Nevertheless, they are delivered to the recipient actor in some partially arbitrary fully serialized order. In order to replay an actor, you not only need the messages sent to it by the external world, you need to know how these races were resolved.

Loggable Non-Determinism

An E Vat is an Actor, and at Vat granularity, E computation is Actor computation. Within a Vat we have approximately the singly threaded synchronous deterministic world of Von Neumann. It seems this is a good enough approximation that we could, under constrained but useful circumstances, arrange to segregate and log all the non-deterministic inputs to a Vat, or a subgraph of objects within a Vat, such that the remainder of the Vat's computation is deterministically replayable from these logged non-deterministic inputs.

We call this property loggable non-determinism -- the ability to deny to a (possibly adversarial) subgraph of objects any unlogged sources of non-determinism. Ideally, such a subgraph should not be able to escape deterministic replay.

Why would one want this property?

1) Cheaper Commitment.

Databases often use this technique by writing a "transaction log". Once a transaction request has been successfully logged, the transaction can be committed even though a crash may destroy its result state, given that the transaction can be deterministically reproduced from the logged request. I first saw this technique applied to committing general purpose computation by Rob Strom: Do a full checkpoint occasionally, and cheaply log inputs until you take the next checkpoint.

It's not clear whether this wins over KeyKOS/EROS mmu-based checkpointing, since that's also incremental. But is sure beats the crap out of non-incremental checkpointing, which is the only reasonable alternative for those sitting above the jvm.

( Read More... )

2) Cheaper Fault Tolerance.

This is really a variation of #1, where the understudy is separated in space rather than time. I first saw this technique applied by Auragen (which since disappeared), and since rather brilliantly by Rob Strom.

( Read More... )

3) Inward Bit Confinement.

Once we've plugged all means of communication accounted for by causality in our computational-model (or system-specification), all remaining ability to communicate must ride on informational inputs that are modelled as uncaused ("spontaneous"?). These are exactly the non-deterministic inputs to computation. In the confinement scenario, Alice generally isn't in a position to constrain Mallet (or verify that he is constrained), but she is in a position to constrain Bob (or verify that he is constrained).

If she constrains Bob to be deterministic except for non-deterministic inputs that she believes Mallet cannot influence, then she knows that Bob cannot receive covert information from Mallet. Therefore, Bob cannot receive instructions from Mallet for making use of those powers given to him by Alice, but which Alice wishes Mallet not to exercise. Even though Bob wishes to use these powers to do Mallet's bidding, he cannot find out what that bidding is.

( Read More... )

4) Debugging.

When a compiler demonstrates a bug, since you presumably have the program which elicited the bug, you can reproduce it. Assuming your debugger doesn't effect computation as experienced by the compiler, you should be able, after the fact, to reproduce the bug under the debugger; even though it's later, the bug didn't happen under the debugger, and no extra cost was spent to instrument the compiler. This effect, normally taken for granted, shows the power of deterministic replay at its best.

For non-deterministic computation, if we don't log non-deterministic inputs, we can't replay past computations. Such a non-deterministic program may drive us crazy with bugs that occasionally manifest themselves, but that we can never catch under the debugger, because it happens to take other paths then.

If we are willing to pay the instrumentation costs to log these inputs, then we can deterministically replay the remainder of the computation under a debugger, and watch a previous bug occur under the scrutiny of the debugger. Unfortunately, these instrumentation costs can be large. Fortunately, tricks #1 (commit) or #2 (fault tolerance) would already have us pay these costs, because they're still cheaper than the alternative.

For example, one can combine these techniques to build a server that is extremely resilient to its own bugs, and that really helps in finding these bugs. (We came up with this plan originally at EC, but never came close to implementing it.) The server process occasionally checkpoints, and also logs all non-deterministic inputs needed to replay it from one checkpoint to the next. It keeps at most three checkpoints on disk simultaneously. In order from oldest to youngest:

a) the one committed to.
b) the one being checked.
c) the one being written out.

as well as the two log-sequences a..b and b..c. Unlike scenario #1, the log doesn't let us commit at finer granularity that a full checkpoint. Rather, while it typically allows us to recover (from a crash) at this fine granularity, our only real commitment is to checkpoint #a. Once #b is completely written out, it is read-in/revived in a checking process which asks all the objects which can to check their internal consistency. If all these objects say everything's cool, we commit to #b and throw away #a and a..b. If we're not cool, then we save away #a, a..b, and #b for debugging, crash the server, revive it from #a, and continue the service forward.

This time, we hope that non-determinism will allow the live service to avoid hitting the bug again. We can even use our knowledge of where the non-determinism lies to ensure that other correct choices are made whenever possible, to make path divergence more likely. The very irreproducibility of non-deterministic bugs, that's normally such a liability to robustness, is potentially an asset for staying in operation while we figure out what's wrong.

To do this figuring out, we've got #a, which reported itself to be consistent, or we wouldn't have committed to it. We've got #b, with a known inconsistency. And we have the a..b log, which should allow us to scrutinize the computation as it proceeds from an apparently consistent state to an inconsistent one. Given the E model that non-deterministic inputs only enter as discrete events on the event loop, and given enough spare cycles, one could even automatically isolate, with the self-consistency checking, the event at which inconsistency becomes evident. Only the execution of this individual event would then need to be stepped through manually in a debugger.

( Read More... )

5) Contract Verification.

This is a new discovery, and is the most interesting. *** to be written

( Read More... )

Hash Tables and Free Will

*** to be written. Some notes:

I don't expect to do any of these immediately, but #3 (confinement) and #5 (contract verification) do become relevant in the near future. It turns out that #5 has a lot in common with #1 (checkpoint), #2 (fault tolerance), and #4 (debugging), but #3 (confinement) is curiously different. To do any of these, we need to do a good job identifying all the hidden sources of non-determinism. I thought I had, but recently hit a rude surprise: hash table enumeration order. Do KeyKOS and EROS also have this covert channel? (I think so.) Was this channel identified? Are it's dangers understood?

To plug this, I came up with a scheme, refined by Ping, to make hash tables deterministic. It's cheap but it's not free. However, it does force a non-upwards compatible change to the determination of whether two ConstMaps are the same. In all other ways, the proposed hash tables are upwards compatible to the current ones. I look forward to writing this up.

( Read More... )

Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.

/ elang / concurrency / determinism

ELib E Language Smart Contracts Related

Download FAQ API Mail Archive Donate

report bug (including invalid html)