|
Scope Analysis
To compile E variable usage into C++ using ENative, the E compiler must
determine which of the several kinds of variable it is dealing with. But
before we can make these distinctions, we first need to introduce a few
definitions.
-
An allocation-contour is a coarsening of the normal
notion of a scope-contour in order to aggregate variable introduction
as much as possible without changing the semantics. In Kernel-E, an
allocation-contour is the scope contour around a method, matcher,
or loop. Given unique variable naming, the declaration of any non-outer-variable
(ie, any variable within a scope box) may be moved to the closest
enclosing allocation-contour without changing its semantics.
-
A lexical-composite is a group of objects defined in
the same allocation-contour. It is a special case of the composite
defined here.
-
Similarly, the objects defined within a lexical composite are deemed
lexical-facets. The objects defined directly
within a lexical composite (ie, not nested within an inner allocation
contour) are direct-lexical-facets of the lexical composite.
-
A frame is the state of a lexical-composite. It's the
union of the non-outer-variables used freely by any of the lexical-facets
of a lexical-composite. From directly within an object (ie, not within
a nested object) the frame holding the state of this object is the
object-frame.
An example facet/composite is:
def getterSetterPair(var value) :any {
def getter() :any { value }
def setter(newValue) { value := newValue }
[getter, setter]
}
The getterSetterPair function defines an allocation-contour,
as do the two nested functions. Since getter and setter
are both defined directly within this allocation-contour (not within a
nested contour), they jointly form a lexical-composite, of which they
are the two direct-lexical-facets. The variables defined within this allocation-contour
are "value", "getter", and
"setter". However, only "value"
is used freely by any of the lexical-facets, so the lexical-composite's
frame holds just this one variable.
We now distinguish along several dimensions of kinds of variables usage.
First, according to where the variable may live:
-
A Local is a variable defined in some allocation-contour,
whose current use is directly within this object or top-level expression
(ie, not within a nested object definition), and one that either hasn't
been optimized into a FastField (see below), or is final (so it doesn't
matter if it is also a FastField). Local variables are implemented
directly as C++ local Fat Pointer variables. "getter",
"setter", and "newValue"
above are Locals.
-
A Field is a state variable of a lexical-composite,
so it lives in the frame of that lexical-composite. It is used freely
by at least one lexical-facet of that composite. Fields are implemented
by indexing into the frame containing that field. From directly within
an object, a field of this object's object-frame is an instance-variable.
"value" is a Field of the above lexical composite.
From directly within getter or setter, "value"
is an instance-variable.
-
An Outer variable is one whose defining occurence is
outside any scope box, and is therefore considered to be part of the
outer scope. Since this scope may be outside the control of the compiler,
we don't optimize this case at all, but rather fall back on the pure
naive computational model: Outer variables are implemented by explicitly
calling the Scope-object which is the outer scope of the top-level
expression being evaluated. "any" above is
an outer-variable.
Our one (transparent) optimization of outer variables is that, within
an object definition, we obtain access to the outer scope from the
enclosing object's Script rather than its state-array, even though
the outer scope is concenptually part of the object's state rather
than its behavior. This optimization places a severe limit on separate
compilation, as a Script would then be specific to the outer scope
in which the top-level expression was evaluated. For various reasons,
we were planning to make compilation this specific and this late anyway,
so this is fine.
Deslotifying
For Locals and Fields, an earlier phase of compilation is the deslotifying
source-to-source transformation. In the output of this transformation,
all non-outer-variables are declared only ":settable"
or ":final", to indicate whether or not they're
mutable. From a lambda-calculus perspective, a ":final"
variable is a pure lambda binding, while a ":settable" variable
is a Scheme-like lexically shared mutable location. The transformation
cases:
Before |
After |
a :final
a
a := v
&a
|
same
|
a :final(vg)
a
a := v
&a
|
x :final ? vg.coerce(x) =~ a :final
a
a := v
&a
|
a :settable
a
a := v
&a
|
same
|
a :settable(vg)
a
a := v
&a
|
x :final ? vg.coerce(x) =~ a :settable
a
a := vg.coerce(v)
SettableSlot(&a, vg)
|
a :slotMaker
a
a := v
&a
|
x :final ? slotMaker.makeSlot(x) =~ a_Slot :final
a_Slot.getValue
a_Slot.setValue(v); v
a_Slot
|
Each box consists of four lines, corresponding to the four variable usage
constructs below: definition, access, assignment, and slot-access.
XXX Note: the above after code needs a way to deal with coercion
failure. This probably requires a change to the coercion protocol.
Should we build a compiler capable of aggresive inlining of both code
and data, we may no longer need to deslotify as aggresively or
at all. Rather, many of the cases that follow could have been generated
by inlining Slots. Put another way, deslotifying, and the other optimizations
below, can be seen as special-cases for Slots of various general purpose
optimizations.
Allocation Type
Following this transformation, our variable usages may now be further
classified according to where their storage is allocated.
-
A Boxed variable needs separately allocated storage
for one mutable Fat Pointer. The compiler generates code to access
and assign to this variable by directly accessing and assigning to
this separately allocated storage. We refer to this separate storage
as a Box. An example would be a mutable variable (one
declared ":settable" after deslotifying) that
is used by both a direct and an indirect lexical-facet of its defining
lexical-composite. Another example would be a mutable Local for which
there exists a slot-access expression. (Are there any other cases?)
As we will see below, a Box can also serve as the state for two primitive
kinds of Slots: a settable Slot, when a pointer to a Box is paired
with a SettableBoxScript, and a final Slot, when a pointer to a Box
is paired with a FinalBoxScript.
-
A Fast variable is one for whom the compiler was able to determine
that a Box was necessary. The variable's value may be stored where
the reference to the Box would have been stored, and it is accessed
and assigned by C++-level access and assignment to this location.
Crossing these, we get the following five kinds of variable usage:
|
Fast
|
Boxed |
Locals
|
|
|
Fields
|
|
|
Outers
|
|
Variable Usage Constructs
Deslotified Kernel-E has four constructs for using variables:
-
Variable Definition. The defining occurence of a variable
occurs only in a FinalPattern or a VarPattern:
FinalPattern: |
varName :ValueGuardExpr
|
VarPattern: |
var varName :SlotGuardExpr
|
In both cases, when this pattern is matched against a specimen, the
match always succeeds and the specimen becomes the initial value of
the variable. The difference between ":settable" and ":final"
only affects the other variable usage constructs.
-
Variable access. This is simply the use occurence of
a variable name as an expression (for language history weenies, an
"rValue"). For example, "a" in "a
+ b" is an access to the variable named "a".
-
Variable assignment. A use occurence of a variable
on the left side of an assignment expression. The assignment expression
as a whole has the value the expression on the right evaluates to,
but we don't bother to show this in the implementation sketch below.
This detail can often be optimized out anyway, as an E compiler should
notice that most assignments are evaluated only for effect. It is
a static error to assign to a value declared ":final".
Such programs must be rejected at compile time.
-
Slot access. The "&name"
expression evaluates to a Slot for accessing or modifying the value
of variable "name". If the variable
is declared ":settable", the returned Slot
object will respond to both getValue and setValue(newValue)
by accessing and modifying the value of the variable. If the variable
is declared ":final", the Slot object will
only respond to getValue.
The Cases
Each case has four rows, corresponding to the four variable usage constructs.
The upper left box will show all the variable decalarions that this case
applies to -- one per line. Other row-entries for that case either contain
the same number of lines, meaning they apply to variables with the respective
declarations, or they are a single line, meaning they apply to all of
that case's possible variables.
FastLocals
A C++ local Fat Pointer variable is used to hold the value of the variable.
E |
C++ |
name :settable
name :final
|
Ref name = specimen
|
name
|
name
|
name := newValue
|
name = newValue;
|
&name
|
new FinalSlot(name)
|
BoxedLocals
A C++ local thin-pointer points at storage for the Fat Pointer holding
the variable's value. We need not consider BoxedLocals declared ":final",
as final local variable usage will always be FastLocal usage.
E |
C++ |
name :settable
|
Ref * namePtr = new Ref(specimen);
|
name
|
*namePtr
|
name := newValue
|
*namePtr = newValue;
|
&name
|
Ref(SettableBoxScript, namePtr)
|
FastFields
An element of a state-array holds the current value of the variable.
The state array itself is accessed by a C++ local thin pointer variable,
here named "frame". When the frame is our object-frame
(the frame holding the instance variables for the current object), it
is initialized as
Ref *frame = self.myData.word.myField;
otherwise it is initialized at the time the frame is allocated:
Ref *frame = new Ref[numVars];
E |
C++ |
name :settable
name :final
|
frame[index] = specimen
|
name
|
frame[index]
|
name := newValue
|
frame[index] = newValue;
|
&name
|
Ref(SettableBoxScript, &frame[index])
Ref(FinalBoxScript, &frame[index])
|
Note that "&name" on a FastField variable
returns a Slot object that points directly into the middle of the
frame in order to point at the variable's storage. This technique requires
a garbage collector able to handle pointers into the middle of allocated
blocks. When using a more limited garbage collector, an explicit FramedFieldSlot
object should be allocated that points to the frame as a whole as well.
BoxedFields
The state-array contains a reference to an arbitrary object expected
to exhibit Slot behavior. Access or assignment is by explicit message
send to this Slot object. We need not consider BoxedFields declared ":final",
as final field variable usage will always be FastField usage.
E |
C++ |
name :settable
|
frame[index] = Ref(SettableBoxScript, new Ref(specimen));
|
name
|
frame[index].myData.word->myBox
|
name := newValue
|
frame[index].myData.word->myBox = newValue;
|
&name
|
frame[index]
|
Outers
Variable usage is by explicitly messaging a scope object that represents
the top level scope of the lexically enclosing top level expression. This
scope object is accessed through the C++ local Fat Pointer variable "Outers".
Within an object expression, Outers is initialized by
Ref Outers = self.myScript->myOuters;
E |
C++ |
name :slotMakerExpr
|
Ref initSlot = slotMaker.call(&DoMakeSlot, specimen);
Outers.call(&DoDefineSlot, Ref("name"), initSlot);
|
name
|
Outers.call(&DoGet, Ref("name"))
|
name := newValue
|
Outers.call(&DoPut, Ref("name"), newValue);
|
&name
|
Outers.call(&DoGetSlot, Ref("name"))
|
|
|