|
The ground rules for this exercise (at
least at this point) are
-
No static type analysis of E source code.
The optional SlotMaker expression in an E variable declaration is
intended, among other things, to allow the programmer to give a compiler
static type information about the values that the variable may hold.
A sophisticated compiler could make direct use of this, and indirect
use as an aid to infer further types. With static type knowledge,
a sophisticated compiler could generate better code than the simple
compiler proposed by ENative.
-
Only simple scope analysis.
Crucial to the efficiency of E, even for a simple naive implementation
such as ENative, is partitioning variables into one of several
cases, and implementing variable definition, access, assignment,
and slot access differently according to which kind of variable we
have.
-
No use of C++'s "virtual".
"virtual" is C++'s mechanism for run-time polymorphism,
and its notation hides some implementation cost. By avoiding it, we
ensure that all run-time polymorphism is provided only by C's pointer-to-function
mechanism, leaving all costs clear.
-
No use of C++'s template feature,
except possibly for the various scalar Lists. The complexity isn't
worth it.
-
No unnecessary use of C++'s references.
By always using pointers instead, indirection costs are always explicit.
Note: if the C++ compilers of interest normally generate better
code for references than pointers, which is quite possible, then this
decision may be revisited.
-
Absorb, rather than reify, E's stack
discipline. Approximately following Brian
Smith's usage, by "absorb" I mean that E calls turn
into C++ calls, E (implicit) returns turn into C++ (explicit) returns,
and E exception throwing & catching turns into C++ exception throwing
& catching. In the other conventional language implementation
option -- reifying stack discipline -- we would implement E's implicit
stack state in explicit C++ data structures. By absorbing, we also
allow C++ native (primitive) code to be written in a style natural
to the C++ programmer.
-
No portable virtual machine support
for stack debugging. This is one of the consequences of absorbing
E's stack discipline into C++'s. In the resulting code, there is no
data structure accessible to portable C++ code that explicitly represents
the C++ stack, and therefore no data structure that explicitly represents
the E stack. For a widely ported C++ compiler & debugger combo
such as gcc/gdb, it is conceivable one could provide "portable"
debugging access to the E stack by wrapping gdb's provision of "portable"
access to the C++ stack. I put "portable" in scare-quotes
to emphasize that this would be portable across hardware platforms
but not across C++ implementations.
It is fine for this meta-level access to be much slower than normal
computation.
-
Portable meta-level access to
board-state in support of, for example, debugging and serialization
control. E computation within a vat occurs as a sequence of atomic
transactions called turns. Each turn consists of the synchronous
processing to completion of a pending delivery from that vat's vat-queue.
Stack state only occurs within a turn. The state of greater importance,
for most meta-level purposes, is the state present in a vat between
turns. If we think of a vat as a game board and a turn as a turn in
a game, the states during a turn (including stack state) are transitional:
"The pawn is in my hand, because I've picked it up but haven't
put it back down." We call the well-defined states between turns
board-states. A turn always computes from one board-state to
another.
A board-state consists of the states of the objects within the vat,
the state of the vat's pending delivery queue, the states of various
primitive "devices", and the state of a vat's comm system
-- managing the references that cross between this vat and other vats.
The E virtual machine will define portable means for accessing all
of this board-state (as well as manipulating some of it). ENative
must implement this spec using only portable C++ constructs, rather
than, as above, relying on features provided by a particular C++ implementation.
Fortunately, only object-state access is relevant to the subset of
E currently represented by ENative.
As above, it is fine for this meta-level access to be much slower
than normal computation.
-
Assume a garbage collector.
There are so many ways to implement garbage collection, with
so many performance tradeoffs, that ENative simply defers this
issue for now and naively allocates (using C++'s "new")
without every freeing. Some conservative garbage collectors are happy
to live underneath such C++ code and successfully garbage collect
without further programmer (or compiler) attention. The overhead of
a given garbage collection scheme can usually be approximated as a
function of the number of objects allocated. Any benchmarks run on
ENative would have to be adjusted by these estimates before they should
be taken seriously.
-
Except for garbage collection, impose
all costs that a complete implementation would impose. ENative,
being for now an academic exercise, is, and may remain, far from a
complete implementation of Kernel-E. However, of the Kernel-E programs
that it does run, it should be at least as expensive as a complete
implementation of ENative would be. For example, the SmallInteger
add code checks for overflow. If overflow occurs, it throws a "BigInteger
Not Yet Implemented" exception rather than allocating
one, but this imposes all the costs on the successful case that a
complete implementation would impose. The resulting measured timing
may be taken as a reliable lower bound on performance (or upper bound
on expected running time) except for
- the above garbage collection issue,
- a larger memory footprint (and therefore potentially a worse instruction-cache
hit ratio),
-
longer startup times, due both to the larger footprint and to
more static initialization. (But still expected to be trivial
compared to Java's startup time.)
-
Impose costs of upgrade-for-prototyping.
This is really a special case of the above, but is listed separately
since it is the least obvious cost that would be imposed by a complete
implementation. To support rapid prototyping, the E specification,
like Smalltalk, supports the loading of replacement behavior for already
instantiated objects, and the conversion of these old object states
to become post-facto instances of the new behavior. Like at least
ParcPlace Smalltalk, this is done lazily at the time the obsolete
objects are invoked. Unlike Smalltalk, E has not yet implemented this
feature. ENative does not either, but it does impose all the costs
on normal execution needed to support this feature.
Note that an upgrade (whether -for-prototyping or -for-release) only
becomes effective between turns, so it need only worry about old object-states
and old undelivered messages, but not old stack-frames. This is quite
fortunate, as there is nothing obvious one can do with a stale stack-frame following an upgrade. (A note to be put somewhere else: Under
normal conditions, an upgrade-for-release also need not deal with
old undelivered messages, since neither the vat-queue nor Eventual
references are checkpointed.)
-
Compile rather than interpret,
but use runtime conventions that allow either. The ENative runtime,
corresponding approximately to ELib, accomodates both E code compiled
into C++ according to its conventions, as well as the interpretation,
by a hypothetical byte-code interpreter, of E compiled to a hypothetical
byte code language. Indeed, I first designed ENative with a byte code
interpreter in mind before I switched to a compilation strategy. Compiling
to C++ has the obvious advantage of speed. A disadvantage that often
drives implementations towards interpreters is code-size blow up.
But by taking an absorption strategy, we manage to avoid this problem.
The remaining disadvantages of compilation are
-
Dynamic code loading becomes non-portable.
This is much like the debugging issue above. A given widely-ported
C++ system, such as gcc, may provide a platform portable way to
dynamically load new code, but C++ code using this abstraction
would be specific to this C++ system, while being hardware-platform
independent. The E language spec requires dynamic code loading,
so a complete implementation of ENative must deal with this issue.
Were we to interpret byte-codes instead (or in addition), then
this issue would go away.
-
Dynamic code unloading becomes
similarly non-protable. This is called out separately because
it would also interact with the assumed garbage collector.
-
Call-site cacheing. Experience
with dynamic object language implementations, especially Smalltalk,
shows that if there's one "sophisticated" optimization that's
worth doing, it's "call-site
cacheing" (originally known as an "inline method cache",
invented by Deutsch and Schiffman and presented at POPL 1984). For
a given message call/send expression, if the dynamic implementation-type
(script) of the object is the same as it was last time, as it usually
is, then this call/send would look up the same method, so go there
directly. ENative demonstrates that this optimization can be coded
very simply. The flow of execution on a cache hit we call the fast-path.
The execution on a cache miss is the slow-path. These caches
must, of course, be invalidated by an upgrade-for-prototyping.
|
|