(The old PDF document, The Kernel-E Language Reference Manual, is now out of date, but still contains information we have not yet migrated to these pages. It is deprecated but useful.) The E language is specified in layers. At the bottom is the Kernel-E language. The Kernel-E language is a subset of the regular E language -- every program written in Kernel-E is also a valid E program with the same meaning. The remainder of E's grammar outside the kernel subset is E's sugar (see The E Language Grammer). The semantics of the sugar is defined by canonical expansion to Kernel-E. This expansion happens during parsing -- E parse trees only contain nodes defined by Kernel-E -- so only these are executed by the virtual machine. To give a semantics of Kernel-E it suffices to write an executable specification of the virtual machine as an interpreter of such parse trees. Following a venerable tail-biting tradition, this chapter presents such an interpreter written in the full E language. (An interpreter written in the same language it interprets is called a Meta-Interpreter.) Unfortunately, this does cause some circular-definition ambiguity. In Brian Smith's terminology[?], the interpreter absorbs some issues by mapping them onto the same issues in the language in which the interpreter is written. When these are the same languages, this leaves some issues unresolved. For the moment, we resolve these ambiguities only informally in the text. (The bootstrap E interpreter is essentially a transliteration into Java of the interpreter presented here. Since Java's definition does not depend on E, and the source to this bootstrap interpreter is available, this should adequately resolve these circular ambiguities for present engineering purposes.) Since this chapter is not concerned about surface syntax, the BNF statements of the kernel productions leave out non-structural grammatical detail, such as issues of precedence and associativity. See The E Language Grammar for these. E Kernel Language Quick Reference CardIn the pseudo-BNF used here, Terminals (tokens emitted by the lexer) are either quoted bold-faced strings, or are names that begin with an upper-case letter. Non-terminals (forms defined by this grammar) are names that begin with a lower-case letter. Square brackets around a form means the appearance of the form is optional. An asterisk suffix means zero or more repetitions of that form, and a plus sign suffix means one or more repetitions. Since we define our semantics in terms of parse trees (technically, abstract syntax trees), we need to be concrete about their form. As explained here, E is moving towards using Minimal-XML DOM trees as its normal parse tree representation, including for E itself. The Kernel-E DTD is the current draft XML DTD for representing Kernel-E programs. Each of the following pages displays the part of this DTD that pertains to it.
Productions beginning with an upper-case letter correspond to terminals in the grammer, and to Minimal-XML elements containing character data. Productions beginning with a lower-case letter correspond to non-terminals, and to Minimal-XML elements containing other elements.
Meta-Interpreter SetupSpecification by Meta-Interpreter EnhancementDefining a computational model that deals with security, upgrade, and debugging in one step is too hard. Rather, we take it in stages. There is a danger of losing security when introducing support for either upgrade or debugging, so we first present -- as the contents of this chapter -- a meta-interpreter for a secure but non-upgradable, non-debuggable E. In Brian Smith's terminology, this meta-interpreter reifies eval but absorbs apply and capability security. Borrowing some terminology from Jonathan Rees, eval is the means by which an object changes it state and decides what actions it would like to take on the world outside of itself -- it is what happens inside an object. apply is the means by which an object takes such external actions -- it is what happens between objects. By absorbing apply, interpreted subworlds can work transparently with non-interpreted contexts. In E, the apply functionality is provided by the callExpr and sendExpr constructs. (This is the also basis for the transparent inter-operation of Java and E objects: ELib implements E's inter-object semantics, and it is used both as the E language's runtime library, and directly as a library called by Java programs. Neither caller nor callee knows whether the other is implemented in E or Java.) This enables us to define upgrade and debugging support as enhanced meta-interpreters, such that we can design and understand the resulting security properties. A real implementation can then provide the behavioral equivalent of allowing (not requiring) code to be run under such an enhanced interpreter. This clearly continues to be a faithful and secure implementation of the semantics specified by the unenhanced interpreter, since one could have run the enhanced interpreter on top of it. The resulting enhanced interpreter gives us clear answers to the "who is allowed to do-or-see what?" issues needed for debugging securely. Basically, the instantiator of an interpreted subworld holds the only debugging capabilities to that world. For anyone else to have debugging access, they must get it from the interpreter-starter. However, the interpreter-starter can only obtain a debugger's-eye-view on an object if they also have the object. Not surprisingly, this follows KeyKOS discipline rather well. Ironically, these same debugging hooks also enhance security, by providing us a discretion check enabling confinement. (** explain & refer somewhere **) Name Spaces
Meta-Interpreter Skeleton: EvaluationAs with the Lambda Calculus and related languages (most notably Scheme), the heart of computation within an object is the evaluation of an expression in a lexical scope. E's expressions are exactly those kernel parse node types defined by the eExpr production above. In the Lambda Calculus, a scope is an immutable mapping from names (in E, nouns) to values. In the non-debuggable, non-upgradable E presented here, a scope is also an immutable mapping from nouns to values, but these values are termed slots, since they are expected to hold a variable's value. Slots are similar to locations in Scheme's semantics, except that the E programmer can bind whatever object they want to be a noun's slot, and can obtain the slot of an in-scope noun. A primitive mutable slot type is available -- and used by default in the expansion of the sugar language -- enabling classic side effects by assignment. So altogether, we have four distinct indirections to get from a variable name to the object it designates: 1) Static CorrespondencenounExpr -> finalPattern or varPattern The statically analyzable correspondence between use-occurences of a variable names (noun expressions) and a defining occurence of the same name (a finalPattern or varPattern). In the kernel language, this strictly follows the left-to-right definer to corresponding close curly rule. The interpreter presented below makes less use of this static analyzability than it probably should, instead managing scopes carefully so that the run-time name/scope uniqueness exactly matches the static correspondence rule. 2) Lexical Loopupscope[identifier] -> slot A scope is an immutable mapping from names to slots. Each time an expression is evaluated, a distinct scope is provided, so it will often look up a distinct slot. For example, the equivalent of object's instance variable storage is a scope allocated per-instance that maps instance variable names to slots holding the values. (Note that E has no distinct notion of instance variables. The effect described is an outcome of the semantics presented here.) 3) Mutable Stateslot.get() -> reference # reading a variable's value slot.put(expr) # assignment When a variable name is used in a noun expression (ther than on the left-side of an assignment), this turns into a slot lookup as described above, followed by asking that slot for its current value. Assignment is turned into a slot lookup, followed by asking the slot to change its value. A slot-expression skips step three, and just returns the slot itself as the value of the expression. Variables can vary precisely because slots can change their value over time. 4) Designation objectreference -> object The value returned by step 3 shouldn't be thought of as the object itself, but as a reference (or pointer, or capability) to the object. When drawing a heap of objects, we typically depict the objects themselves as circles or blobs, and objects references as arrows. If object Alice points at object Bob, then Alice holds the tail of an arrow whose head is attached to Bob. Since E is a distibuted language, we give these arrows more semantics than usual: in particular, Alice and Bob can be on different machines, in which case the reference will span machines. Such EVENTUAL references have a restricted semantics that reflect the inescapable difficulties of distributed computing -- like partial failure. This is documented in the E Reference Mechanics. In Lambda Calculus, expression evaluation takes in an expression and a scope, and produces a value. In E, expression evaluation similarly takes in an expression and a scope, but produces an outcome. The three forms of outcome are
The following meta-interpreter absorbs the call-return stack discipline, so that returning successfully in the interpreted language is represented by eval returning successfully, and non-local exit in the interpreted language is represented by eval performing the same non-local exit. The natural object-oriented way to define the eval function would be to distribute it into a set of eval methods defined on each expr-parse-node type. (Indeed, the bootstrap interpreter written in Java does exactly this.) However, this has the wrong extensibility property: we wish to hold the definition of the kernel language fixed over long periods of time, while we also expect to define enhanced semantics for executing E -- especially for debugging and upgrade -- by enhancing this interpreter. Therefore we need to enable multiple interpreters to co-exist in one system, and to allow the others to be defined by incremental modifications to this one. To do this, we write the interpreter as a simple function containing a big switch. Each branch of the switch matches one of the expression types above, and is shown in the corresponding section: def eval(expr, scope) { switch (expr) { ... match e'...' { ... } ... } } Meta-Interpreter Skeleton: Pattern MatchingIn the Lambda Calculus, defining occurences of nouns only appear in parameter lists. Most programming languages also have a variable declaration/definition construct that can appear in a block of code. Where classically one has defining occurences of variable names, E instead has patterns. Where one classically binds a name to an initial value -- by argument matching or initialization -- E instead pattern matches the pattern against the initial value. When the pattern is the expansion of a simple Identifier, the effect is identical to the classical case. In other words, the classic parameter variable declarations and local variable definitions are degenerate (but common!) forms of E's pattern matching mechanism. As with eval we define E's pattern matching routine, testMatch, with a global function that dispatches (using E's switch) on the type of the pattern. The Java implementation of the bootstrap interpreter instead distributes the clauses into the Pattern parse-node types. def testMatch(patt, scope, specimen) { switch (patt) { ... match e'...' { ... } ... } } The testMatch function takes in a Pattern parse-tree, a scope as the context in which to perform the match, and a run-time value to match it against. The outcomes:
As we will see below, by returning a one-element array for the success case, rather than just returning the result directly, we enable the use of matchBind expressions in the interpreter to both test for success and bind the result in a compact fashion. Miscellaneous Interpreter FunctionsThe mustMatch function is like testMatch, except that an unsuccessful match results in failure rather than returning null, and therefore a successful match can just return the scope directly. If mustMatch returns, it will always return a scope. def mustMatch(patt, scope, specimen) { def [result] := testMatch(patt, scope, specimen) result } Defining Behaviors: Method & MatcherClassically, an object is an encapsulated package of state and behavior, and E's semantics reflects this faithfully. Objects are defined by the object expressions, of which there are two kinds: methodical expressions -- for defining methodical objects, and plumbing expressions, for defining objects that act as message plumbing. An object expression evaluates in a scope to an object that behaves as described by the object expression. For example, the following program in the full E language def makePoint(x, y) :any { def self { to getX() :any { return x } to getY() :any { return y } } return self } Sample command-line interaction in the context of the above definition: ? def pt := makePoint(3, 5) ? pt.getX() # value: 3 expands into the following Kernel E program: def Point { to run(x y) :any { def self { to getX() :any { return x } to getY() :any { return y } } return self } } ? def pt := makePoint.run(3, 5) ? pt.getX() # value: 3 In the original program, it seems we've defined Point to be a function. In the expansion, we see that Point is actually an object with a single run method that takes two arguments. After all, kernel-E is a pure object language: all values are objects, and all inter-object interaction (with the exception of equality) is purely by message sending. However, E's sugar streamlines the use of objects to express conventional functional/procedural patterns. run is E's default verb. If left out, it will be supplied in the expansion to the kernel language. An apparent function definition actually defines an object with a run method. The behavior of Point's run method is to define and return an object (named self within the scope of the run method, and therefore named self to itself). By behavior, we mean this is what Point does in response to a run message of two arguments. This returned object's behavior is similarly described by two methods: getX and getY. (Another tasty bit of sugar is that zero-argument argument lists may be left out of both definition and call. However, you may not leave out both the verb and the argument list.) This returned object acts like a conventional point object, but where does the state of the object come from? Somehow, x and y are acting like its classical instance variables, but there doesn't seem to be anything special about their declaration. Indeed there isn't. The scope of x and y extends from their definition until the close curly at the end of the run method. The self object expression within this scope can therefore refer to these slots by using their names. So the state-nouns of an object expression are those nouns used within the object expression that statically correspond (see above) to noun definitions outside the object expression. An object expression evaluates in a scope to an object. The object's state is the subset of this scope containing the object expression's state-nouns. When an object receives a message, it executes a corresponding eMethod or matcher in a scope which is a child of this state. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Unless stated otherwise, all text on this page which is either unattributed or by Mark S. Miller is hereby placed in the public domain.
|